Writing a parser/lexical analyzer builder.

"Alexander Morou" <alex@alexandermorou.com>
Fri, 26 Oct 2007 01:26:05 -0500

          From comp.compilers

Related articles
Writing a parser/lexical analyzer builder. alex@alexandermorou.com (Alexander Morou) (2007-10-26)
Re: Writing a parser/lexical analyzer builder. DrDiettrich1@aol.com (Hans-Peter Diettrich) (2007-10-27)
| List of all articles for this month |
From: "Alexander Morou" <alex@alexandermorou.com>
Newsgroups: comp.compilers
Date: Fri, 26 Oct 2007 01:26:05 -0500
Organization: Compilers Central
Keywords: parse, tools, question
Posted-Date: 26 Oct 2007 09:39:12 EDT

Greetings,


I'm attempting to build a parser/lexical analyzer builder in C# using
the .NET 2.0 Foundation. Right now I'm in the conceptualization
stage, and I wanted to get a bit of insight before I got too far into
building code that might potentially just blow up in my face.


I could easily write a simple parser that understands EBNF or BNF,
however I wanted to ease the construction of writing grammar
description files for complex systems (say C#, its expression system
alone has eleven different precedences, not just operators,
-precedences-.) To do so I'm introducing my own variant of templates
into the system.


I have defined a sample grammar in the following file:
http://lhq.rpgsource.net/text/csExpressions.oilexer


The project intends to use recursive descent to handle parses. I
realize that rules defined as: AddExp ::= AddExp AddOperators
MulDivExp | MulDivExp will by default, in recursive descent, recurse
into infinity without precautionary measures. To solve such a crisis
I decided I would add a '_Continuous' parse case to self-referencing
First-targets and thus as a pseudo mockup I decided upon:
http://lhq.rpgsource.net/text/aeTest.txt


Where AddExp references another Production Rule, and MulDiv references
a token. You'll note the differences in their proposed
implementation, one strictly uses the lookahead from the local
tokenizer stock, the other uses a parse method to determine the same.
Now remember the above code is just something I threw together to
presumably solve the issue, I have not verified it because I want to
ensure I'm taking the project in the right direction before I code en
masse.


The reason I'm posting is: an online associate of mine said that the
extensions to an already established norm (EBNF) are superfluous. Is
there any use in adding templates, allowing tokens to be categorized
to make writing rules easier, and other changes? I don't want to
waste my time in creating a way to obfuscate grammar description in a
way that's unusable. If it is useful can it be cleaned up, or is what
I have even viable at all?


Thanks in advance,


-Alexander Morou


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.