Branched gotos was: Re: A minimal LL(1) parser generator ?

Christopher F Clark <christopher.f.clark@compiler-resources.com>
Mon, 6 Jan 2020 08:47:31 +0200

          From comp.compilers

Related articles
A minimal LL(1) parser generator ? borucki.andrzej@gmail.com (Andy) (2019-12-21)
Re: A minimal LL(1) parser generator ? rockbrentwood@gmail.com (2020-01-04)
Branched gotos was: Re: A minimal LL(1) parser generator ? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2020-01-06)
Re: Branched gotos was: Re: A minimal LL(1) parser generator ? jamin.hanson@googlemail.com (Ben Hanson) (2020-01-07)
| List of all articles for this month |

From: Christopher F Clark <christopher.f.clark@compiler-resources.com>
Newsgroups: comp.compilers
Date: Mon, 6 Jan 2020 08:47:31 +0200
Organization: Compilers Central
References: 19-12-016 20-01-005
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="58192"; mail-complaints-to="abuse@iecc.com"
Keywords: code
Posted-Date: 06 Jan 2020 09:39:04 EST

rockbrentwood@gmail.com wrote:
> The question of who in the list does bona fide code synthesis (as opposed to
> cookie-cutter code generation) is not directly addressed, as far as I can see.
> But the items can be reviewed individually.


> Perhaps you will end up writing a parser generator that does this. The obvious
> "direct map" method is
> Draft Version:
> state = goto label
> state transition = goto
> general format:
> Label:
> action
> goto(s) - either single goto or branched gotos


> Synthesized Version:
> control flow structures are synthesized from this.


While I must admit that I am not sure I understand your question (your
conception of how automata work is too different than mine for me to
readily understand your points), if you are asking what I think you
are asking, we solve your question a slightly different way.


So, let me rephrase your question a little bit. When you come to the
end of a production, you have a reduce action and that causes the
popping of a non-terminal from the stack. Depending upon what you
pop, the state machine transitions to another state. This is normally
coded as the "goto table" in standard LR constructions. But the goto
table is in fact essentially a jump table, just as the shift table is
a jump table on incoming tokens.


If that is a correct interpretation of your question, then we do
something different for that, and what we do is similar to a code
generation process, and in fact, we have a version that generates the
code is if-then-else and/or case statements as appropriate, to
completely encode the tables in the target programming language. Not
recursive ascent, but more a direct implementation of the FSA as code.


So, first our LR construction is non-standard. We not only keep
tokens on our parser stack, but when we reduce a production, we put
the non-terminal itself on the stack. Thus, in a state, you might see
either a token or a non-terminal as the item you are processing. So,
the shift-table and the goto-table are merged into one table.


Now, to make that work, we don't encode what to do in the state
itself. There are no goto states. No reduce states. No accept
states. Etc. A state is simply a jump table that implements "see this
on the stack and take this transition". The transition itself encodes
what the action should be. I think this difference is similar to the
difference in Mealy and Moore machines. One encodes the output in the
state, the other on the transition (and you can transform one to the
other). I just happen to like my encoding to be all on transitions,
with states just as tables.


And we have an "assembly" language of actions that a transition can
invoke. Thus, normally, upon seeing a token, the actions are "shift;
goto state n;" two assembly language steps. At the beginning of a
non-terminal (and only then) do we want to push something on the stack
(this is similar to what Mule does and I think similar to your bracket
notation, although I have not entirely figured that out), so if the
token is the first token in a rule, it does three assembly language
steps, "push; shift, goto state n". Similarly if there is an
output/action associated with the transition, that will be added to
the assembly language sequence "shift; do act m; goto state n;". I
am not going to detail all the possible actions. There are more than
one would expect because it allows us to implement a variety of
things. (I am currently considering making the machine deal with
captures and backreferences. That is simply some more actions in the
assembly language.)


In any case, hopefully, you can imagine how we simply invoke those
actions in a switch/case statement as the steps (macro calls, function
calls, or by expanding such code inline) for that case. You can then
decide whether you want two nested switch/case statements (one on
state, one per state on input) to encode them or to merge them or
generate them as code. (At Intel, I used a similar technique to
compress the states as many states have only a few cases to consider.
So some states want to be a jump table, but many (most) states in a
FSA are simply check for one or two items and if not found fail. You
can often turn than into linear code.) There are numerous choices
that we implement as various table generation options. The internal
model of jump tables with transitions that have specific actions does
not limit that.


Best regards,
Chris


******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.