Re: Q2. Why do you split a monolitic grammar into the lexing and parsing rules?

mefrill@yandex.ru (Vladimir)
28 Feb 2005 00:49:28 -0500

          From comp.compilers

Related articles
Q2. Why do you split a monolitic grammar into the lexing and parsing r spam@abelectron.com (valentin tihomirov) (2005-02-20)
Re: Q2. Why do you split a monolitic grammar into the lexing and parsi vidar@hokstad.name (Vidar Hokstad) (2005-02-28)
Re: Q2. Why do you split a monolitic grammar into the lexing and parsi Ron@xharbour.com (Ron Pinkas) (2005-02-28)
Re: Q2. Why do you split a monolitic grammar into the lexing and parsi mefrill@yandex.ru (2005-02-28)
Re: Q2. Why do you split a monolitic grammar into the lexing and parsi ndrez@att.net (Norm Dresner) (2005-02-28)
Re: Q2. Why do you split a monolitic grammar into the lexing and parsi rh26@humboldt.edu (Roy Haddad) (2005-03-04)
| List of all articles for this month |

From: mefrill@yandex.ru (Vladimir)
Newsgroups: comp.compilers
Date: 28 Feb 2005 00:49:28 -0500
Organization: http://groups.google.com
References: 05-02-087
Keywords: lex
Posted-Date: 28 Feb 2005 00:49:28 EST

As it is known, it is not possible to write CF grammar, which
describes the language with identifiers declaration before their
using. This condition cannot be expressed by CF grammar but only by
context sensitive one. So, as most of syntax analysers are written to
parse CF languages the condition 'declaration before using" is dropped
from the grammar and implemented as "semantic" property of the
compiler in name tables. Because lexer returns one terminal "ID" with
the ID's string as a "semantic" value. It is rather easier to use such
the strategy than incorporate names parsing in context sensitive
grammar. Another reason to use lex analyser is the identifier's
language is regular and can be parsed by using finite state machines.
It is obvious way of homo sapiens to devide some complex task on the
number of easier ones.




"valentin tihomirov" <spam@abelectron.com> wrote in message news:05-02-087...
> Looks like this is a group discussing parsing-translating issues as a matter
> of compiler front-ends. So, the problem is to enable using reserwed keywords
> as identifiers. The problem is purely artificial; the input "(name name)" is
> can be generated by
>
> name -> '(' 'name' ID ')';
> ID -> ('a'..'z')+;
>
> CF grammar; therefore, it must be recognizable. However, the lexer will turn
> the 2nd "name" into a literal token which is different from ID token. As a
> result, the token stream "LP NAME NAME RP" will not match the input. The
> issue merely does not exist for unified grammar recognizes. On the on hand,
> it looks natural that we need to combine letters into words as it is easier
> to process a text as a stream of words and separators. In addition, the
> grammar of natural languages explicitly introduces the alphabet (set of
> letters) and dictionary (set of words). Nevertheless, a solid grammar does
> not deny constructing high-level terms from the letters or other terms.
> Actually, a grammar is more powerful device than a stream of words as they
> are not limited by two levels. The ANTLR's lexer is based on the same
> algorithms as the parser; that is, it can perform a complete translatorion.
> So, I do not understand why do we need the artificial obstacle, the 2nd
> level?
> [Try writing a single-level parser that disregards spaces and optional spaces in
> the usual way. A separate lexer makes that a whole lot easier. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.