Re: A simpler way to tokenize and parse?

Kaz Kylheku <864-117-4973@kylheku.com>
Sun, 26 Mar 2023 18:19:23 -0000 (UTC)

From comp.compilers

Related articles
A simpler way to tokenize and parse? costello@mitre.org (Roger L Costello) (2023-03-24)
Re: A simpler way to tokenize and parse? mal@wyrd.be (Lieven Marchand) (2023-03-25)
Re: A simpler way to tokenize and parse? 864-117-4973@kylheku.com (Kaz Kylheku) (2023-03-26)
Re: A simpler way to tokenize and parse? spibou@gmail.com (Spiros Bousbouras) (2023-03-26)
Re: A simpler way to tokenize and parse? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2023-03-26)
*Re: A simpler way to tokenize and parse? 864-117-4973@kylheku.com (Kaz Kylheku)* (2023-03-26)**
Re: A simpler way to tokenize and parse? tkoenig@netcologne.de (Thomas Koenig) (2023-03-27)
Re: Lisp syntax, A simpler way to tokenize and parse? mal@wyrd.be (Lieven Marchand) (2023-03-27)

| List of all articles for this month |

From:	Kaz Kylheku <864-117-4973@kylheku.com>
Newsgroups:	comp.compilers
Date:	Sun, 26 Mar 2023 18:19:23 -0000 (UTC)
Organization:	A noiseless patient Spider
References:	23-03-011 23-03-019
Injection-Info:	gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="4491"; mail-complaints-to="abuse@iecc.com"
Keywords:	Lisp, syntax
Posted-Date:	26 Mar 2023 20:52:21 EDT

On 2023-03-25, Lieven Marchand <mal@wyrd.be> wrote:
> Roger L Costello <costello@mitre.org> writes:
>
>> I have done some work with Flex and Bison and recently I've done some work
>> with building parsers using read. My experience is the latter is much easier.
>> Why isn't read more widely discussed and used in the compiler community?
>> Surely the concept that read embodies is not specific to Lisp and Scheme,
>> right?
>
> Apart from the already mentioned problem that it forces you into a
> syntax that a lot of people don't like, there's also the problem that
> you have to deal with hostile input. Where you expect "(+ 2 3)" someone
> will enter "(+ 2 3 #.(progn (launch-the-nukes) 4))". A lot of security

Not every Lisp dialect has hash-dot read-time evaluation; that's
a feature of Common Lisp, disabled by setting/binding *read-eval*
to nil. I don't seem to recall that Scheme has it. I deliberately
kept it out of TXR Lisp.

However, that doesn't disable compile-time evaluation in macros,
which kicks in if you feed the read code to the compile function.
compile must be regarded the same as eval from a security POV.
We are seeing compile-time evaluation in newer languages,
though.

It's not a bona-fide security issue, except in applications that
dynamically compile untrusted input. Since the aim is almost
always to execute it, whether the malice happens at compile
time or run time. Both have to be sandboxed.

When you're building an open-source program, it's a given that
you're running its code: shell scripts, make files or what
have you. It doesn't need read-time evaluation to perpetrate
malice.

> problems in real world settings come from not correctly validating
> inputs and by the time you have worked around all these problems read
> isn't all that easy anymore. C for example has a somewhat similar
> facility scanf that tries to pattern match input and is also considered
> unsafe.

scanf is unsafe, but not in the way that hash-dot read-time evaluation
is unsafe. The situations are not comparable.

scanf doesn't feature a documented, reliable scan-time programing language
that the *input* can use to extend the program which calls scanf,
and which can be turned off by a flag.

You have to exploit a buffer overflow or whatever, enabled by careless
use of scanf.

All they have in common is that calling read on untrusted input without
disabling *read-eval* is a kind of careless use of read.

But setting *read-eval* to nil is s something you may be able to do in
just one place in the entire application. (Anything which needs
*read-eval* can opt-in using (let ((*read-eval t)) ...) around its
calls to read.

> A good rule of thumb for production ready software is to define
> a grammar for valid input and provide a validating parser.

Sure, if you want to waste your time defining grammars and
writing validating parsers.

This is no longer done that much outside of the Lisp world. People use XML,
JSON, Yaml, ..., whose grammar they definitely didn't design or
implement, and validate the content/shape of the object that comes out.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: A simpler way to tokenize and parse?

Kaz Kylheku <864-117-4973@kylheku.com>Sun, 26 Mar 2023 18:19:23 -0000 (UTC)

Kaz Kylheku <864-117-4973@kylheku.com>
Sun, 26 Mar 2023 18:19:23 -0000 (UTC)