Re: Handling streaming input

Chris F Clark <cfc@shell01.TheWorld.com>
31 Dec 2004 13:04:00 -0500

          From comp.compilers

Related articles
Handling streaming input sriyansa@gmail.com (Sriyansa) (2004-12-30)
Re: Handling streaming input cfc@shell01.TheWorld.com (Chris F Clark) (2004-12-31)
| List of all articles for this month |
From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: 31 Dec 2004 13:04:00 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: 04-12-152
Keywords: tools
Posted-Date: 31 Dec 2004 13:04:00 EST

Sriyansa asked:
> I am trying to write a parser module which takes in a streaming input
> and converts it to an operation stack. All the steps (char stream ->
> token stream -> operation stack) are asynchronous and the modules read
> from and write into incomplete data structures.


I cannot attest to how easy it is to do in ANTLR, but with the right
object-model and the right parsing model, it is not difficult to do.
In partciular, we structure our parsing library, Yacc++ and the
Language Objects Library around exactly such a methaphor.


So, what is the right object-model. Fairly simple. The "input
object" must be a separate class from the lexer and parser objects and
the class must be polymorphic. I think the ANTLR and JavaCC both have
input objects as a separate class and access that class polymorphicly.


Now, what is the right parsing model. There are two choices.


1) One can have an "event-driven" parsing model--that's what Yacc++
      uses. In an event-driven model, the control flow is inverted, the
      input object drives the lexer when it has new input and the lexer
      drives the parser when it has new tokens. Part of successfully
      having an event driven model is having all the state of the driven
      objects be in the object and none of it implicit in call stack
      (i.e. no persistent local variables and no use of calling hierarchy
      to encode position). The second restriction rules out recursive
      descent, since that methodology is rooted in keeping the parser
      position in the call hierarchy.


2) If one doesn't have an event-driven parsing model, the other
      possibility is using buffers (and possibly processes) to simulate
      it. I've seen several parsers follow this approach. In the buffer
      model, each level, doesn't directly communicate with the next level
      "up", but instead fills a buffer (i.e. it is a producer). The
      level above also doesn't read from the level below, it reads from
      the buffer (i.e. it is a consumer). When the buffer is empty the
      top levels "stall" and wait for the lower level to put something in
      the buffer. If one is using separate processes, one uses some form
      of processes syncronization (e.g. semaphores) to control access to
      the buffer. However, one can also implement the scheme in a single
      process environment. For example, I've seen several compilers that
      run the lower level anytime the buffer is empty and when running
      the lower level, run it until the buffer is full (or nearly full).


Note, in actuality the Language Objects Library does both. It uses
inverted control flow because that has certain nice properties (in
particular, it gives most of the advantages of having separate
processes without the overhead--the inverted flow control acts like
the semaphores causing the top-level processes to run exactly when
they have data available) and it has buffers that fill and empty. The
only thing not implemented in the Language Objects Library is putting
the producers and consumers in separate processes (or threads).


In any case, the input object is separate from the grammar and there
are no grammar changes to use a streaming methodology. The effect of
streaming is entirely in the "run-time library". If you want to adapt
ANTLR to support such an input model, you won't change anything in
your grammar. It will all be in ancillary code.


Of course, the next question comes down to how much of the ANTLR
support is contained in the code generated for each grammar and how
much is in the run-time library. With Yacc++, very little is actually
generated by the parser generator--it is effectively mostly some
tables of integers and some class declarations. Other parser
generators have struck other balances. In general, recursive descent
parser generators tend to generate more of the mechanics as part of
the generated code and less in a library, because they depend on the
"host language semantics" to get the implicit parser position via the
calling hierarchy.


However, I know that ANTLR has some routines, that do the lesxer
parser interface in a "library", so it is not doomed to be hopeless.
On a similar positive note, ANTLR has more the one "code generator"
which means it is feasible to change the generated output if the
semantic hooks do need to be embedded there.


I wish I could give you a more concrete answer as to how easy the job
will be. It may be trivial as someone may have already done the work
for you (i.e. someone else may have needed the exact model you desire
and made the appropriate changes). It may also be quite difficult,
not from the conceptual level, but only due to the "irrelevant"
details you may have to resolve to make such a hack.


Note, if the support isn't trivial, I would look at some other parser
generators, as it is possible that someone has written a parser
generator/library that does exactly what you want. Now, not all of
them will be of ANTLR's quality, which is worth considering, but they
may be good enough for your needs.


Best of luck,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.