Q2. Why do you split a monolitic grammar into the lexing and parsing rules?

"valentin tihomirov" <spam@abelectron.com>
20 Feb 2005 16:51:51 -0500

          From comp.compilers

Related articles
Q2. Why do you split a monolitic grammar into the lexing and parsing r spam@abelectron.com (valentin tihomirov) (2005-02-20)
Re: Q2. Why do you split a monolitic grammar into the lexing and parsi vidar@hokstad.name (Vidar Hokstad) (2005-02-28)
Re: Q2. Why do you split a monolitic grammar into the lexing and parsi Ron@xharbour.com (Ron Pinkas) (2005-02-28)
Re: Q2. Why do you split a monolitic grammar into the lexing and parsi mefrill@yandex.ru (2005-02-28)
Re: Q2. Why do you split a monolitic grammar into the lexing and parsi ndrez@att.net (Norm Dresner) (2005-02-28)
Re: Q2. Why do you split a monolitic grammar into the lexing and parsi rh26@humboldt.edu (Roy Haddad) (2005-03-04)
Re: Q2. Why do you split a monolithic grammar into the lexing and pars spam@abelectron.com (valentin tihomirov) (2005-03-08)
[1 later articles]
| List of all articles for this month |

From: "valentin tihomirov" <spam@abelectron.com>
Newsgroups: comp.compilers
Date: 20 Feb 2005 16:51:51 -0500
Organization: Compilers Central
Keywords: parse, question
Posted-Date: 20 Feb 2005 16:51:51 EST

Looks like this is a group discussing parsing-translating issues as a matter
of compiler front-ends. So, the problem is to enable using reserwed keywords
as identifiers. The problem is purely artificial; the input "(name name)" is
can be generated by


        name -> '(' 'name' ID ')';
        ID -> ('a'..'z')+;


CF grammar; therefore, it must be recognizable. However, the lexer will turn
the 2nd "name" into a literal token which is different from ID token. As a
result, the token stream "LP NAME NAME RP" will not match the input. The
issue merely does not exist for unified grammar recognizes. On the on hand,
it looks natural that we need to combine letters into words as it is easier
to process a text as a stream of words and separators. In addition, the
grammar of natural languages explicitly introduces the alphabet (set of
letters) and dictionary (set of words). Nevertheless, a solid grammar does
not deny constructing high-level terms from the letters or other terms.
Actually, a grammar is more powerful device than a stream of words as they
are not limited by two levels. The ANTLR's lexer is based on the same
algorithms as the parser; that is, it can perform a complete translatorion.
So, I do not understand why do we need the artificial obstacle, the 2nd
level?
[Try writing a single-level parser that disregards spaces and optional spaces in
the usual way. A separate lexer makes that a whole lot easier. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.