Re: Why separate Lexical & Parser Generators

morrison@hal.cs.uiuc.edu (Vance Morrison)
Fri, 7 Oct 1994 13:31:14 GMT

          From comp.compilers

Related articles
Why separate Lexical & Parser Generators heronj@smtplink.NGC.COM (John Heron) (1994-10-05)
Re: Why separate Lexical & Parser Generators andand@csd.uu.se (Anders Andersson) (1994-10-06)
Re: Why separate Lexical & Parser Generators leichter@zodiac.rutgers.edu (1994-10-06)
Re: Why separate Lexical & Parser Generators morrison@hal.cs.uiuc.edu (1994-10-07)
Re: Why separate Lexical & Parser Generators johnl@cs.indiana.edu (John Lacey) (1994-10-10)
Re: Why separate Lexical & Parser Generator steve@cegelecproj.co.uk (1994-10-10)
Re: Why separate Lexical & Parser Generators hagerman@ece.cmu.edu (1994-10-10)
Re: Why separate Lexical & Parser Generators wrs@apple.com (Walter Smith) (1994-10-10)
Re: Why separate Lexical & Parser Generators cef@geodesic.com (Charles Fiterman) (1994-10-11)
Re: Why separate Lexical & Parser Generators pardo@cs.washington.edu (1994-10-11)
[4 later articles]
| List of all articles for this month |
Newsgroups: comp.compilers
From: morrison@hal.cs.uiuc.edu (Vance Morrison)
Keywords: lex, yacc
Organization: University of Illinois at Urbana
References: 94-10-028
Date: Fri, 7 Oct 1994 13:31:14 GMT

John Heron <heronj@smtplink.NGC.COM> writes:


>Pardon me if this question is naive. Why have a separate parser generator
>and lexical analyzer generator? ...


I looked into this question at one time and tried to build a parser that
operated directly on the input character stream instead of tokens.


One problem I ran into is the problem with lookahead. Typically parsers
(YACC) only have one token of lookahead. If your input tokens are
characters instead of keywords, this leads to greater ambiguity. For
one production might be possible on a string of characters if the next
token is the `FOR' keyword. That same string of characters might also
work if the next token was an identifier. A traditional parser will
not have a problem with that but one based on characters will because
given the single lookahead 'F', it can not determine if the lookahead
is a keyword or a identifier.


Another problem is dealing with comments because they can occur ANYWHERE.
This can be solved by simply restricting where comments can go (which may
not be a bad idea in any case).


Vance
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.