Re: A minimal LL(1) parser generator ?

anton@mips.complang.tuwien.ac.at (Anton Ertl)
Wed, 22 Jan 2020 17:08:19 GMT

          From comp.compilers

Related articles
[10 earlier articles]
Re: A minimal LL(1) parser generator ? rockbrentwood@gmail.com (2020-01-04)
Re: A minimal LL(1) parser generator ? gaztoast@gmail.com (honey crisis) (2020-01-05)
Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com (2020-01-05)
Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com (2020-01-05)
Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com (2020-01-05)
Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (2020-01-22)
Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (2020-01-22)
Re: A minimal LL(1) parser generator ? carlglassberg@gmail.com (2020-01-23)
Re: A minimal LL(1) parser generator ? anton@mips.complang.tuwien.ac.at (2020-01-25)
Re: A minimal LL(1) parser generator ? FredJScipione@alum.RPI.edu (Fred J. Scipione) (2020-01-25)
| List of all articles for this month |

From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Date: Wed, 22 Jan 2020 17:08:19 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
References: 19-12-016 20-01-005
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="8496"; mail-complaints-to="abuse@iecc.com"
Keywords: LL(1)
Posted-Date: 23 Jan 2020 10:28:45 EST

rockbrentwood@gmail.com writes:
>On Sunday, December 22, 2019 at 10:17:44 AM UTC-6, Andy wrote:
>> ANTLR has even LL(*) but is too complicated. I am searching maximal
>> simple and elegant generator which generates function call like
>> written by hand.
>
>A large set of parsers are lined up in the parser generator comparison on
>Wikipedia here
>https://en.wikipedia.org/wiki/Comparison_of_parser_generators
>
>The question of who in the list does bona fide code synthesis (as opposed to
>cookie-cutter code generation) is not directly addressed, as far as I can see.
>But the items can be reviewed individually.


It's unclear to me what you mean with those two terms. "Like written
by hand" is somewhat clearer in that I don't write code manually that
automata-based generators generate. The code generated by generators
of recursive-descent parsers should look more like hand-written code;
especially if you optimize for that. In general, though there are a
number of reasons why the code will deviate from that ideal:


* The interface to the scanner is narrow and leads to stylized code
    where a hand-written parser might make use of knowledge of the
    scanner (i.e., use a wider interface).


* Some things are simpler for a human, and others are simpler for a
    code generator, so if one does not optimize for looking like code
    written by a human, the result will look different.


As an example, consider the following rule in Gray:


nonterminal expr
...
(( (( term
      || "-" term {{ 0 swap - }} ))
      (( "+" term {{ + }}
      || "-" term {{ - }} )) **
)) expr rule ( -- n )


(This is for a simple calculator).


Gray generates code for the rule that Gforth decompiles as:


noname :
    <term+$D18> testsym
    IF <term+$A8> @ execute
    ELSE <"+"+$A0> testsym ?readnext <term+$A8> @ execute <term+$158>
    THEN
    BEGIN <term+$D58> testsym
    WHILE <")"+$A0> testsym
                  IF <")"+$A0> testsym ?readnext <term+$A8> @ execute <term+$428>
                  ELSE <"+"+$A0> testsym ?readnext <term+$A8> @ execute <term+$638>
                  THEN
    REPEAT ;


You see control structures similar to what a human would write. The
|| results in an IF...ELSE...THEN, the ** in a BEGIN...WHILE...REPEAT.
That part is fairly straightforward and does not need a detour through
state machines (at least as long as you stay with LL(1) grammars).


But you also see that the word has no name and lots of the words it
calls have no name, either (all the <...+$...> things are addresses of
unnamed words); that's because it's easier for the generator to deal
with addresses than to generate new names. A human would use names
instead.


You also see occurences of testsym and ?readnext that are the
interface to the scanner.


- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.