Related articles |
---|
Introducing FALIB, a compiler compiler wrschlanger@gmail.com (Willow) (2009-12-19) |
From: | Willow <wrschlanger@gmail.com> |
Newsgroups: | comp.compilers |
Date: | Sat, 19 Dec 2009 03:54:22 -0800 (PST) |
Organization: | Compilers Central |
Keywords: | tools, available |
Posted-Date: | 23 Dec 2009 00:53:47 EST |
I just finished the first release of my very own compiler compiler,
similar to Yacc or Bison.
You can find the C++ source code at the top of this page:
http://code.google.com/p/vm64dec/downloads/list
It is called RULECOM, the Rule Compiler and uses FALIB, which stands
for Finite Automaton LIBrary (included). It accepts EBNF-style
grammars and then converts each rule into an NFA, which has lambda
loops and transitions removed, then is converted to a DFA for quick
parsing.
I made heavy use of DFA theory in the parser. Nonterminals in the
grammar are treated like regular symbols, although it makes sure there
are no conflicts. In the generated switch statement, the nonterminal
is expanded to a series of terminals that the nonterminal can start
with. If one of the nonterminal's symbols are accepted, the out edge
is taken only after a call is made to the nonterminal method, itself.
I used some boostrap code to get the process started. Now, the
compiler compiler (RULECOM) actually is partly self-generated, the
grammar it uses to parse grammars, is a grammar in the same language!
Willow
---
Here is the compiler compiler's grammar for the grammars it accepts:
// in_rulegram.c - grammar for rulecom grammars
// Copyright (C) 2009 Willow Schlanger
// *** After bootstrap, add a ';' to end of copyright & prefix.
copyright "Copyright (C) 2009 Willow Schlanger"
prefix "rc" // for rulecom
token IDENT LITCHAR LITSTRING ;
keyword "::=" ISDEFAS ;
start ::=
statement { statement } ;
statement ::=
"token" IDENT { IDENT } ';' |
name "::=" alternate { '|' alternate } ';' |
"keyword" LITSTRING IDENT ';' |
"copyright" LITSTRING |
"prefix" LITSTRING ;
name ::= IDENT | "start" ;
alternate ::= symbol { symbol } ;
symbol ::=
item |
'{' alternate '}' |
'[' alternate ']' ;
item ::=
LITCHAR |
LITSTRING |
IDENT ; // note: IDENT may start with 0-9 too
Return to the
comp.compilers page.
Search the
comp.compilers archives again.