Introducing FALIB, a compiler compiler

Willow <wrschlanger@gmail.com>
Sat, 19 Dec 2009 03:54:22 -0800 (PST)

          From comp.compilers

Related articles
Introducing FALIB, a compiler compiler wrschlanger@gmail.com (Willow) (2009-12-19)
| List of all articles for this month |
From: Willow <wrschlanger@gmail.com>
Newsgroups: comp.compilers
Date: Sat, 19 Dec 2009 03:54:22 -0800 (PST)
Organization: Compilers Central
Keywords: tools, available
Posted-Date: 23 Dec 2009 00:53:47 EST

I just finished the first release of my very own compiler compiler,
similar to Yacc or Bison.
You can find the C++ source code at the top of this page:
http://code.google.com/p/vm64dec/downloads/list


It is called RULECOM, the Rule Compiler and uses FALIB, which stands
for Finite Automaton LIBrary (included). It accepts EBNF-style
grammars and then converts each rule into an NFA, which has lambda
loops and transitions removed, then is converted to a DFA for quick
parsing.


I made heavy use of DFA theory in the parser. Nonterminals in the
grammar are treated like regular symbols, although it makes sure there
are no conflicts. In the generated switch statement, the nonterminal
is expanded to a series of terminals that the nonterminal can start
with. If one of the nonterminal's symbols are accepted, the out edge
is taken only after a call is made to the nonterminal method, itself.


I used some boostrap code to get the process started. Now, the
compiler compiler (RULECOM) actually is partly self-generated, the
grammar it uses to parse grammars, is a grammar in the same language!


Willow
---
Here is the compiler compiler's grammar for the grammars it accepts:


// in_rulegram.c - grammar for rulecom grammars
// Copyright (C) 2009 Willow Schlanger


// *** After bootstrap, add a ';' to end of copyright & prefix.
copyright "Copyright (C) 2009 Willow Schlanger"
prefix "rc" // for rulecom


token IDENT LITCHAR LITSTRING ;
keyword "::=" ISDEFAS ;


start ::=
statement { statement } ;


statement ::=
"token" IDENT { IDENT } ';' |
name "::=" alternate { '|' alternate } ';' |
"keyword" LITSTRING IDENT ';' |
"copyright" LITSTRING |
"prefix" LITSTRING ;


name ::= IDENT | "start" ;


alternate ::= symbol { symbol } ;


symbol ::=
item |
'{' alternate '}' |
'[' alternate ']' ;


item ::=
LITCHAR |
LITSTRING |
IDENT ; // note: IDENT may start with 0-9 too



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.