Related articles |
---|
Q2. Why do you split a monolitic grammar into the lexing and parsing r spam@abelectron.com (valentin tihomirov) (2005-02-20) |
Re: Q2. Why do you split a monolitic grammar into the lexing and parsi rh26@humboldt.edu (Roy Haddad) (2005-03-04) |
Re: Q2. Why do you split a monolithic grammar into the lexing and pars spam@abelectron.com (valentin tihomirov) (2005-03-08) |
Re: Q2. Why do you split a monolithic grammar into the lexing and pars ndrez@att.net (Norm Dresner) (2005-03-08) |
From: | "valentin tihomirov" <spam@abelectron.com> |
Newsgroups: | comp.compilers |
Date: | 8 Mar 2005 01:03:51 -0500 |
Organization: | Compilers Central |
References: | 05-02-087 05-03-019 |
Keywords: | lex, design |
Posted-Date: | 08 Mar 2005 01:03:51 EST |
> I'm assuming this would be like being able to do this in C:
> int else = 4;
> printf("%i",else);
Nice example, I was thinking of pascalish
procedure do();
> A character is a different class of object than a word, and the two
> operate on different levels.
One bit, one soound or one char is not enaught to encode all the
notions in the world. Fortunately for mankind, the ancient Latins, as
opposed to Chinese, were wise enough to construct the words from
letters. You right, most of the words encode notions in natural
languages, where the letters confom to sounds in the oral speach
(English language has lost this natural relationship). Such separation
looks natural in the human world. The formal languages do not have the
2 levels restriction. They may have as much levels as necessary. When
I speak about CF lang I see a syntax tree.
> Tokens that are considered to have direct
> meaning, like ELSE, IF, WHILE, etc.
.. and the wrong meaning will be assigned by lexer to ELSE in your
example above. The ELSE may have any sense depending on the context
(as well as most words in the natural languages, which are considered
PS or CS). Recognizers do not worry about meaning at all (both tokens
and letters are symbols in formal langs).
> and because of this, with a unified grammar I would want to have all
> "words" constructed out of "letters" and nothing else derived from
> "letters",
Which class do literal tokens belong to? The IF, THEN, ELSE are separator
marks.
IF expr THEN stat ELSE stat;
Some words (known as phrases) consist of 2 or more words.
> so there would be no practical difference between a separate
> lexer and parser and a unified parser...
The advantage is a known context of symbols. Despite we do not use CS
languages, the words still have a context-dependent menaning. In
addition, the lexer cannot calculate the LL(k) lookahead as it cannot
predict what comes after the token end; it assumes that any token can
follow resulting in huge ambiguities (ANTLR just supresses them not to
confuse user); meantime, the top-down parser knows start symbol and
owns information about FOLLOW(token) set.
> So I don't think the problem is artificial, or at least the problem is
> not artificial in quite that way.
Humans bring their (natural) way of thinking into the formal language
theory which already has a general device for language description.
> I the most natural solution I can think of that works purely by
> grammar is to make the lexer and parser non-deterministic, so 'else'
> above would come out of the lexer as the set { ID, ELSE
> }.
As you see, the lexer prematurely assigns a meaning to the "word"
regardless of the context. Context is a placeholder where a token
(syntactically structured sequence of symbols) can appear in the
production rule.
[The most plausible argument I've seen for syntax design without
reserved words was for PL/I, which had so many of them that it was
impractical for programmers to remember them all. Assuming you
have a more parsimonious language than PL/I or Cobol, the ability
to use IF or FOR as a variable is more likely to be confusing than
useful. -John]
Return to the
comp.compilers page.
Search the
comp.compilers archives again.