A simpler way to tokenize and parse?

Roger L Costello <costello@mitre.org>
Fri, 24 Mar 2023 14:45:40 +0000

          From comp.compilers

Related articles
A simpler way to tokenize and parse? costello@mitre.org (Roger L Costello) (2023-03-24)
Re: Lisp syntax, was A simpler way to tokenize and parse? spibou@gmail.com (Spiros Bousbouras) (2023-03-25)
Re: Lisp syntax, was A simpler way to tokenize and parse? anton@mips.complang.tuwien.ac.at (2023-03-25)
Re: A simpler way to tokenize and parse? mal@wyrd.be (Lieven Marchand) (2023-03-25)
Re: Lisp syntax, was A simpler way to tokenize and parse? gah4@u.washington.edu (gah4) (2023-03-25)
Re: Lisp syntax, was A simpler way to tokenize and parse? 864-117-4973@kylheku.com (Kaz Kylheku) (2023-03-26)
Re: A simpler way to tokenize and parse? 864-117-4973@kylheku.com (Kaz Kylheku) (2023-03-26)
[5 later articles]
| List of all articles for this month |
From: Roger L Costello <costello@mitre.org>
Newsgroups: comp.compilers
Date: Fri, 24 Mar 2023 14:45:40 +0000
Organization: Compilers Central
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="84956"; mail-complaints-to="abuse@iecc.com"
Keywords: Lisp, lex, comment
Posted-Date: 24 Mar 2023 18:43:03 EDT

Hello Compiler Experts!


I am reading the book, "Programming Languages, Application and Interpretation"
by Shriram Krishnamurthi.


The book says that Lisp and Scheme have a primitive called "read".


The book says, "The read primitive is a crown jewel of Lisp and Scheme."


Some of my notes from reading the book:


- Read does tokenizing and reading.
- Read returns a value known as an s-expression.
- The s-expression is an intermediate representation.
- The output of read is either a number or a list. That's it!


Example of tokenizing/parsing using read:


(+ 3 4) --> read --> (list `+ 3 4) --> parse --> (add (num 3) (num 4))


The first expression (+ 3 4) is the concrete syntax.
The middle expression (list `+ 3 4) is an s-expression. It is an intermediate
representation.
The last expression (add (num 3) (num 4)) is the abstract syntax.


The book says: read is one of the great ideas of computer science. It helps
decompose a fundamentally difficult process - generalized parsing of the input
stream - into two simple processes:


(1) reading the input stream into an intermediate representation
(2) parsing that intermediate representation


I've read several compiler books and none of them talked about this. They talk
about creating a lexer to generate a stream of tokens and a parser that
receives the tokens and arranges them into a tree data structure. Why no
mention of the "crown jewel" of tokenizing/parsing? Why no mention of "one of
the great ideas of computer science"?


I have done some work with Flex and Bison and recently I've done some work
with building parsers using read. My experience is the latter is much easier.
Why isn't read more widely discussed and used in the compiler community?
Surely the concept that read embodies is not specific to Lisp and Scheme,
right?


/Roger
[Yes, it's specific to Lisp and Scheme. They have an extremely simple
symtax called S expressions of nested parenthesized lists of space
separated tokens with some quoting. The original plan was that Lisp 2
would have M expressions that looked more like a normal language but
it's over 50 years later and they still haven't gotten around to it.
-John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.