Re: include files, was lex - how to get current file position of parser file

Kaz Kylheku <kaz@kylheku.com>
Fri, 4 Sep 2015 05:26:23 +0000 (UTC)

          From comp.compilers

Related articles
lex - how to get current file position of parser file jkallup@web.de (Jens Kallup) (2015-09-01)
Re: lex - how to get current file position of parser file kaz@kylheku.com (Kaz Kylheku) (2015-09-01)
Re: lex - how to get current file position of parser file federation2005@netzero.com (2015-09-03)
Re: include files, was lex - how to get current file position of parse kaz@kylheku.com (Kaz Kylheku) (2015-09-04)
| List of all articles for this month |

From: Kaz Kylheku <kaz@kylheku.com>
Newsgroups: comp.compilers
Date: Fri, 4 Sep 2015 05:26:23 +0000 (UTC)
Organization: Aioe.org NNTP Server
References: 15-08-019 15-09-001 15-09-002
Keywords: parse, design
Posted-Date: 04 Sep 2015 16:42:42 EDT

On 2015-09-03, federation2005@netzero.com <federation2005@netzero.com> wrote:
> On Thursday, September 3, 2015 at 10:02:20 AM UTC-5, Kaz Kylheku wrote:
>> Another approach (if you can design the language that way) is to specify
>> reentrant parser and lexer.
>
> If you can get a clean unambiguous cut this should work. But it bears
> to point out that the reason you don't see reentrancy very much is
> that the LR framework is not really compatible with it. Lookaheads
> have to all be lined up, for a clean descent to a sub-level.


While that is true, it is not an impediment at all. Reentrant parsers
involve the use of completely separate parser/lexer instances operating
on separate streams. Each maintain their lookahead.


For instance, suppose that parser P0 (top level) reduces an include
construct:


          include : INCLUDE file
                              {
                                    /* Pseudo code */
                                    parser_t P1; /* our custom parser type */
                                    parser_init(&P1, $2); /* opens file */


                                    yyparse(&P1);


                                    $$ = P1.abstract_syntax_tree;
                              }
                          ;


Even if P0 has read the next token past the "INCLUDE file" syntax
(and based on that token, in fact, it has decided to reduce
this rule), that doesn't affect anything going on in parser P1.
It has its own file open, own lexer with its own token stream, its
own Yacc stack, its own lookahead token.


When the nested yyparse is done, we have the parse; we can integrate
it into the outer parse, and keep going.


> This issue is what underlies the frequent-occurrence of the "advanced
> entry point" hack (e.g. starting up a sub-parser for "expressions" in
> a typical Algol language), where the sub-level starts up one or more
> tokens past the starting point of the thing being parsed.


Parsing a subset of the grammar like just an "expression" is a different issue,
though related because it behooves you to use reentrant parsers. However, the
use of reentrant parsers doesn't necessarily you are doing such a thing.


I have experience in this area. Look for SECRET_ESCAPE_E in this
Yacc file:


http://www.kylheku.com/cgit/txr/tree/parser.y


The E stands for expression. SECRET_ESCAPE_E is a token that is never
lexically analyzed. It is injected into the token stream via a "token unget"
type operation (like ungetc(stream) but for a token, not a character).


To see how this is used, look in this file:


http://www.kylheku.com/cgit/txr/tree/parser.c


for a function called "prime_parser". When we want to call yyparse to
read anothe expression from the stream, we must not only prime the parser
with the SECRET_ESCAPE_E token, but we must also inject the lookahead token
from the previously parsed expression!


For the sake of this, I support multiple tokens of pushback in the token
stream (up to four, but I only ever use two).


If there is a "yychar" from a recent parse (yychar is the Yacc name,
as you know, of the lookahead token), then we push that first.
Then we push the secret escape token.


Blam: call yyparse and it is fooled. The secret token guides it into the
correct area of the grammar to parse what we want and the pushed yychar
restores its continuation context. Like reloading the registers of a thread
and dispatching so it continues where it was.


I suppose we could call this ... Yacc/cc: Yacc with current continuation.


OMG kill me now. :)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.