Re: why did you chose compiler development?

Scott Stanchfield <thetick@scruz.net>
27 May 1997 23:43:34 -0400

          From comp.compilers

Related articles
why did you chose compiler development? ast@halcyon.com (1997-05-22)
Re: why did you chose compiler development? salomon@silver.cs.umanitoba.ca (1997-05-25)
Re: why did you chose compiler development? walter@bytecraft.com (Walter Banks) (1997-05-25)
Re: why did you chose compiler development? scotts@metaware.com (Scott Stanchfield) (1997-05-25)
Re: why did you chose compiler development? thetick@scruz.net (Scott Stanchfield) (1997-05-27)
Re: why did you chose compiler development? marssaxman@sprynet.com (1997-05-27)
Re: why did you chose compiler development? ian@five-d.com (1997-05-30)
Re: why did you chose compiler development? conway@mundook.cs.mu.OZ.AU (1997-05-30)
Re: why did you chose compiler development? dwight@pentasoft.com (Dwight VandenBerghe) (1997-05-31)
| List of all articles for this month |
From: Scott Stanchfield <thetick@scruz.net>
Newsgroups: comp.compilers
Date: 27 May 1997 23:43:34 -0400
Organization: scruz-net
References: 97-05-242 97-05-283
Keywords: practice

Just because data is shared doesn't mean the modules are coupled...
(At least not by me definition -- of course I've been wrong once or
twice, or sixty times...)


Take a simple, typical PCCTS compiler design (very generalized):


-- define a scanner that says "group characters like these into logical
      units called tokens. A token is an object that represents just a
      group of characters. No other contextual information (other than
      file name/line/col for error messages).
      The scanner has a "nextToken()" routine available.


-- define a token buffer that can make several calls to the scanner's
      nextToken() routine to look any number of tokens ahead, storing
      these tokens in a dynamic buffer
      The token buffer has two main routines:
            nextToken() which grabs a token from the head of the buffer
            LT(n) which returns the nth token ahead


-- define a parser that, using the token buffer's LT() routine makes
      decisions on how to proceed in a parse. It uses nextToken() to
      tell the token buffer to toss the token at the head of the buffer
      (Note that the parse can also be directed by checking symantics of
      previously-seen token contexts. What I mean here is that if we had
      previously seen "typedef x y" we know x is a type and the next time
      we see x we can make a parsing decision knowning it is a type. NOTE:
      this semantic usage is totally within the parser -- no feedback to
      scanner. The scanner just returns basic tokens.
      The parser does the following:
          -- recognizes patterns of tokens for meaning
          -- builds an AST representing the parsed text
          -- stores detailed information about symbols used in the ASTs
                in a "symbol table" for convenient lookup and preventing
                duplication of information. (We could store this info directly
                in the tree if we were silly...)


      So after the parse, we have an AST and a symbol table describing
      some of the components in that tree. (Think of it as a glossary
      containing definitions of the words used in a paper by someone who
      writes at a post-doctorate word-use level -- noone can understand
      the paper without constantly referring to the glossary...)


-- define a tree parser (using similar notation to the stream parser)
      that walks through the AST, reorganizing it and emitting code


(The AST is optional -- could have the parser directly generate code,
but that's irrelevant to this example.)


As far as I can see, this is about as modular as you can get. I assume
that the definition of "module coupling" means that two modules are
interdependant on data and or control flow; they cannot be thought of
in terms of "black boxes."


The process is broken down into "successive globbing," from characters
to tokens, tokens to phrases, phrases to ASTs, then spit out from ASTs
to code. Nice pipeline and data analysis. And nicely modular.


I don't see how a compiler scheme (such as the above, which is a pretty
good model for most compilers) can be considered strongly coupled.
Using strictly the interfaces provided (nextToken() et al) the guts of
each part can be changed at will without regard to the guts of any other
part.


The interface is not just the call -- it's also the assumed state of
data in the system. If you'd really prefer, the symbol table can be
passed as a parameter from the front-end to back end. Doesn't really
make a difference. As long as the front end says "my END result is a
symbol table with the following information" and the back end states
that that's the state of the symbol table it expects.


Also - think of the scanner performing a singular task -- translating a
group of characters into a single token. I don't consider it several
logically-cohesive tasks, but a single physically cohesive task.


I have seen cases in the past where a yacc parser communicates state
information back to the scanner, who determines which token to send to
the parser. This is pure evil, and is frought with the danger of the
scanner sending a token to yacc as lookahead to determine if a
production should be reduced, only to have that production's action send
state to the scanner about what the next token should be. (My intent
was to partially confuse with that statement... ; Code like that can be
mighty confusing)


I will agree that code like that can be awful in terms of coupling. The
problem isn't the _concept_, just the _implementation_.


A parser-generator like ANTLR in PCCTS can decide upon which
alternatives to take based on that symantic information - no dependency
between parser/scanner state. A good implementation for the concept.


(Cool conversation, BTW)
-- Scott


Daniel J. Salomon wrote:
>
> Andrew Tucker <ast@halcyon.com> wrote:
> |> 3) It is the very picture of module coupling and cohesion.
> Breaking
> |> down the process into front/middle/back ends and making each one
> |> independent of each other is a textbook case that predates the
> |> ideas from structured design.
>
> Actually most compilers are a disaster in terms of module coupling. ...


--
Scott Stanchfield Santa Cruz,CA
Visit my PCCTS Tutorial at http://www.scruz.net/~thetick/pcctstut
Visit our new baby at http://www.scruz.net/~thetick/claire




--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.