Re: Theory about DFA's and f?lex start states

"Joachim Durchholz" <joachim_d@gmx.de>
21 Nov 2000 13:54:32 -0500

          From comp.compilers

Related articles
[4 earlier articles]
Re: Theory about DFA's and f?lex start states cfc@world.std.com (Chris F Clark) (2000-11-14)
Re: Theory about DFA's and f?lex start states joachim_d@gmx.de (Joachim Durchholz) (2000-11-14)
Re: Theory about DFA's and f?lex start states ucapjab@ucl.ac.uk (Jonathan Barker) (2000-11-14)
Re: Theory about DFA's and f?lex start states frank@gnu.de (2000-11-14)
Re: Theory about DFA's and f?lex start states pcj1@my-deja.com (2000-11-19)
Re: Theory about DFA's and f?lex start states cfc@world.std.com (Chris F Clark) (2000-11-21)
Re: Theory about DFA's and f?lex start states joachim_d@gmx.de (Joachim Durchholz) (2000-11-21)
Re: Theory about DFA's and f?lex start states vbdis@aol.com (2000-11-22)
| List of all articles for this month |

From: "Joachim Durchholz" <joachim_d@gmx.de>
Newsgroups: comp.compilers
Date: 21 Nov 2000 13:54:32 -0500
Organization: Compilers Central
References: 00-11-073 00-11-080 00-11-083 00-11-100 00-11-134
Keywords: lex
Posted-Date: 21 Nov 2000 13:54:32 EST

<pcj1@my-deja.com> wrote:
> "Jonathan Barker" <ucapjab@ucl.ac.uk> wrote:
> But stepping back from a practical standpoint for a second: Assume I'm
> a developer writer trying to write a grammar for my new language. I'm
> struggling to write the lexer because I've overloaded the syntax to
> the point that I can't seem to achieve what I want because many of the
> tokens have overlapping definitions (they cause conflicts in the
> DFA). So the (natural?) thing is to try to partition those token
> definitions into several distinct layers and switch between them when
> appropriate.
>
> [...]
>
> Any thoughts? Mine are increasingly less productive as it is nearly
> 5am now.


This sounds as if your design has grown to a complexity beyond what
anybody would want to maintain. Assume that you're getting the thing
to run tomorrow: the language will evolve. (Looking at the situation
you describe, I'd actually assume that it already has evolved quite a
bit!) You're facing the prospect of getting it to work as intended
with some new extension every other year or so. If you don't find a
clean theory of what your code is doing, you'll never be able to test
it (you can't do an exhaustive test on all conceivable character
strings, so you'll keep overlooking pathological cases).


I think your best option is to scrap your current design and try
something different. For example:
* Just use a "sane" lexical structure. If there is legacy code, use the
old lexer and make it emit code in the new syntax (that's easy enough if
the changes are purely lexical, though you can spend an arbitrary effort
trying to keep comments aligned). Sell it to your customers/users as
"new, improved, clearer syntax".
* Use a simplified lexing process. Let the lexer just return tokens, and
have some other layer of the language processor determine what type of
overload happened to this token. (If you're lucky the parser will be
able to do that; if not, you'll have to resort to things like
symbol-table lookup, which creates all sorts of ugly two-way inter-layer
dependencies.)
* Use a preprocessor that prefixes overloaded tokens with characters
that make a standard lexer work. (This was something that I've seen done
on a truly horrible syntax. I left the project before it worked, but
I've been told it went well.)
* Use microtokens. Identify recurring patterns in your tokens. Have a
lexer run over the code and produce microtokens; then run the stream of
microtokens through another lexer that builds the ordinary macrotokens.
(Well, you might need a parser for the micro->macro conversion.)


The latter two approaches are instances of what I call "layerization".
It's a useful technique in general: try to allocate various processing
steps to a "high" and a "low" level, then look whether the resulting
inter-layer interface looks "clean enough". If it does, good; if not,
try a different division; if several attempts at layerization fail,
declare the problem to be too unmodular to be tractable ;)


Regards,
Joachim


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.