Parser classifications?

Alexander Morou <alexander.morou@gmail.com>
Tue, 11 Aug 2009 16:58:54 -0700 (PDT)

          From comp.compilers

Related articles
Parser classifications? alexander.morou@gmail.com (Alexander Morou) (2009-08-11)
Re: Parser classifications? zaimoni@zaimoni.com (Kenneth 'Bessarion' Boyd) (2009-08-13)
Re: Parser classifications? alexander.morou@gmail.com (Alexander Morou) (2009-08-13)
Re: Parser classifications? zaimoni@zaimoni.com (Kenneth 'Bessarion' Boyd) (2009-08-16)
Re: Parser classifications? alexander.morou@gmail.com (Alexander Morou) (2009-08-17)
| List of all articles for this month |

From: Alexander Morou <alexander.morou@gmail.com>
Newsgroups: comp.compilers
Date: Tue, 11 Aug 2009 16:58:54 -0700 (PDT)
Organization: Compilers Central
Keywords: parse, question
Posted-Date: 12 Aug 2009 12:19:47 EDT

Hello,


First, thanks to everyone else who's given me advice in the past, I'm
not sure if I've said it, but I do appreciate it.


Previous posts I've made, about a parser compiler project I'm working
on, have discussed my efforts in trying to find a good method to
represent a top-down class parser that uses the utmost minimal
backtracking method possible. My original aim was a recursive descent
parser, after getting into it and evaluating other such programs that
use this approach, I've decided that recursive descent probably isn't
my ideal solution.


I think I've solved the major problem: finding a way to evaluate the
grammar in a deterministic manner, that can be represented on a
machine with finite limitations.


I'm not very good at explaining things in a formal manner, so if I'm a
bit cryptic, I apologize. So far, my goal has been to break a complex
problem down into smaller steps. Here's an overview:


1. Define each of the productions (aka rules) in a deterministic view
      Separating transitions from state to state into two categories:
          a. Token transitions move from one state to another, using only
                information about the small, named, lexical patterns defined
                by a given language.
          b. Rule Transitions move from one state to another using the
                identifier of a rule as the requirement, and also give a
                pointer to the initial state of the rule represented by the
                transition. This essentially is the primary bit of
                information needed to calculate the follow sets, since it
                specifies the internal target state of matching a rule, as
                well the initial state for that rule.


2. Using the above deterministic model of the productions, define
      a lazily evaluated, also deterministic, view of the language from
      the start production. To determine the look ahead at any given
      point, the process is further divided into two phases:
      a. First evaluate the given state-trail used to reach a given
            production. Due to finite limitations, this trail only needs
            to be aware of the rules that are active in the current path
            (for follow set inclusion) and the individual states of that
            rule which can be used in place of that rule in the path (since
            they point back to it). The segmented view does not provide
            enough context information for parse graph production, but it
            does provide ample data to give the valid set of tokens at any
            given state of the language.
      b. The first step operates by looking at every rule introduced
            into the path 'stream', as well as the states that are
            encountered as a result of a rule terminating. Since two
            different paths can enter (or exit) the same rule, the second
            phase concerns itself with only the original states, since the
            path itself does not matter for calculation (but they are
            remembered for sake of continuing the path[s].)


The above describes what I have so far, it's not going to give me a
parser, but it might be suitable for recognition, I'm presently
working on a supplemental step that will work with the above system to
handle path remembrance to build a proper parse graph.


From initial testing, it appears to handle left recursion fine.
Ironically, because left recursion repeats its follow information
infinitely until there are no more tokens associated to that state
set, it performs better than other types of recursion (since for a
left-most derivative parser, inner recursion and right recursion have
to have as many trailing states as they have starting states, so the
paths become longer and longer)


http://alexandermorou.com/text/CSStreamState.html
http://alexandermorou.com/text/CSStreamState+SourcesBuilder.html
http://alexandermorou.com/text/CSStreamState+BuildTransition.html
http://alexandermorou.com/text/CSStreamState+BuildTransition+ParentChildPairing.html
http://alexandermorou.com/text/CSStreamState+RulePath.html
http://alexandermorou.com/text/CSStreamState+RulePath+FollowInfo.html
http://alexandermorou.com/text/CSStreamState+SourcedFrom.html
http://alexandermorou.com/text/CSStreamState+SourcesInfo.html
http://alexandermorou.com/text/CSStreamState+StatePath.html
http://alexandermorou.com/text/CSStreamState+TransitionTable.html
http://alexandermorou.com/text/CSStreamState+TransitionTable+KeysCollection.html
http://alexandermorou.com/text/CSStreamState+TransitionTable+ValuesCollection.html
http://alexandermorou.com/text/TokenTransition.html


The above thirteen files is the bulk of the code associated to the
second phase. They were generated by a code generator, in which I
hand wrote the logic for a smaller language and embedded the logic
into the generator. For the non-colored versions, use .cs for the
file extension.


Does anyone here know what class of grammar this will be by the time
I'm done? Are there serious issues with the concept? Initial path
discovery and look-ahead generation can be slow (100 ms on 229 tokens
for the first attempt, 135 microseconds on the second pass.) I think
a majority of this is just the price paid for it being lazily
evaluated. The good news is (if there is such a thing), parses that
have similar components to them will use many of the same states.


As an example, if I were to step through the states on the production
of a method, regardless of the internal structure of that method, the
exit state on both should be identical (or at least, the states
leaving that exit state should be).


I've tried reading other articles about LL-class parsers, but I can't
seem to find a specific example of something similar. Perhaps I'm
looking for the wrong thing. Knowing how to refer to the kind of
parser used is helpful for discussion purposes.


Thanks,


PS: Yes I realize I'm using LL and top-down interchangeably, if this
is wrong, please let me know.


PPS: The above works a lot with paths and set construction what would
a formal description of its function look like? What am I lacking
knowledge wise so that I can describe it in a terser manner, because
long messages like this are painful to write, or read for that matter.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.