From: | Stephen Horne <sh006d3592@blueyonder.co.uk> |
Newsgroups: | comp.compilers |
Date: | Mon, 20 Apr 2009 12:31:27 +0100 |
Organization: | virginmedia.com |
References: | 09-04-015 09-04-034 |
Keywords: | books |
Posted-Date: | 20 Apr 2009 11:27:13 EDT |
On Sat, 18 Apr 2009 12:19:29 +0100, "Dean Wakerley"
<dwnewsgroups@tiscali.co.uk> wrote:
>"Modern Compiler Design", Dick Grune, Henri Bal, Ceriel Jacob ... John Wiley
>& sons. This does have diagrams you complain of.
This is a firm favorite of mine, and a new edition is coming soon. But
it does have diagrams, and IMO they are important to understanding.
However, I'm not convinced you need any visual thinking ability to
understand them. The diagrams are available to download for free from
here...
http://www.cs.vu.nl/~dick/MCD.html
An example of an important diagram is figure 2.85 on page 159 of the
book, and page 124 of the one-diagram-per-page postscript download.
This gives a state model for an LR(0) automaton, with the dotted rules
within each state. The grammar it parses is given in figure 2.85, page
153 of the book, page 120 of the download.
For me, this diagram was key to understanding LR parsing - not in a
just-look-and-visualise way, but by looking at a from-state,
transition and to-state and working out how the states relate to each
other and what the automaton is doing. There is a dry-run figure (pg.
161 of the book, 126 of the download) showing a sequence of actions
and the stack at each step, but to me, the state model itself is key.
When it comes to lexical analysis and parsing, state models are a fact
of life - you can't really avoid dealing with them - but how the state
model is presented is secondary to what it means, and there is no
just-look-and-all-is-magically-revealed presentation even if you are a
visual thinker.
So yes, the state model is presented as a diagram, but in terms of my
getting to understand it, it could just as well have been a tabular
listing of from-state, transition, to-state or whatever - just so long
as the dotted rules are all included in the state descriptions. The
point is to understand what changes with each step of the process - to
understand why each transition is possible, and why each state has the
dotted rules that it has. When you understand it on that level, the
algorithm to construct it is obvious.
For me, the key points about this LR(0) state model are as follows...
If I choose S1 as the from-state to look at...
S1 : Z -> E.$
E -> E.+ T
What's important here is that the dots represent how much of each rule
has been recognised so far, with the capital letters being
non-terminal symbols. In this state, we have always just recognised
(shifted) non-terminal E - it's just before the dot in both lines -
and the next token to recognise is either a '$' or a '+'. These are
the two possible transitions. If we follow the '$' case, we get to
state S2...
S2 : Z -> E $.
We no longer have the 'E -> E + T' rule since it didn't match the '$'.
We still have the 'Z -> E $' rule, but the dot has moved on one step.
Since the whole rule has been matched, a reduce is possible here.
That's enough to understand the handling of terminal shifts.
Looking at state S3, however...
S3 : E -> E +.T
T -> .i
T -> .(
We can reach this from S1 by shifting the '+' rather than the '$', but
based on what we know so far, we'd expect...
S3 : E -> E +.T
That is, two extra dotted rules have been added - why?
Well, the top dotted rule is ready to match T - but that's a
non-terminal which never occurs as a simple token in the input. So we
add the other two dotted rules by referring back to the grammar, so
that we can match the content of T from here, and so that the reduce
(when that content is fully matched) will return back here (due to
popping stuff off the stack) with the non-terminal T as the next token
to shift.
Also interesting, we can reach S3 from S8 as well as from S1. This is
purely because when we shift a '+' from S8, even though the dotted
rules in S1 and S8 differ, the end result is a state with the exact
same dotted rules, which therefore may as well be the same state.
Essentially the same thing applies to a lot of the material in MCD.
The pictures may well each be worth a thousand words, but often you
still have to take the time to work through the processes they
describe, or they're nothing more than not-very-pretty pictures. The
key is not to look at what is drawn, but to work out why it was drawn
that way.
I should point out, though, that I've rarely read much past the first
half of the book - lexical analysis, parsing, attribute handling and
some of the general code generation stuff. The memory management and
logic programming sections have sometimes got my interest, but a lot
of the second half of the book deals with things that either seem
obvious to me, or which I'm just not that interested in.
BTW - I have what is sometime classed as an NvLD myself - Aspergers. I
have no problem with visual thinking, though, so long as what I need
to visualise is an abstraction. When it comes to visualising people,
real world objects, etc I just can't. I can't describe people that I
know well, for instance, unless I made a point to verbalise and
memorise when I could see them - though I have no problem recognising
them.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.