Re: DFAs,NFAs,REs,CFGs,PDAs ARGH!

Derk Gwen <derkgwen@HotPOP.com>
10 Aug 2003 10:52:16 -0400

          From comp.compilers

Related articles
DFAs,NFAs,REs,CFGs,PDAs ARGH! tony@transCendenZ.co.uk (Anthony Webster) (2003-08-04)
Re: DFAs,NFAs,REs,CFGs,PDAs ARGH! p2sam@uwaterloo.ca (2003-08-10)
Re: DFAs,NFAs,REs,CFGs,PDAs ARGH! usenet0@skora.net (Thomas Skora) (2003-08-10)
Re: DFAs,NFAs,REs,CFGs,PDAs ARGH! derkgwen@HotPOP.com (Derk Gwen) (2003-08-10)
Re: DFAs,NFAs,REs,CFGs,PDAs ARGH! joachim.durchholz@web.de (Joachim Durchholz) (2003-08-23)
| List of all articles for this month |

From: Derk Gwen <derkgwen@HotPOP.com>
Newsgroups: comp.compilers
Date: 10 Aug 2003 10:52:16 -0400
Organization: Quick STOP Groceries
References: 03-08-015
Keywords: lex, DFA
Posted-Date: 10 Aug 2003 10:52:16 EDT

"Anthony Webster" <tony@transCendenZ.co.uk> wrote:
# Hi,
#
# I'm writing the lexer and parser parts of a compiler. I've got a basic
# lexer working but it relies on a massive while statement and I was
# thinking of using something more intesting like a DFA built from a
# RE. I'm still a little confused as to which data structure to use for
# what part of the system. I understand the mechanics of things like
# CFGs and PDAs but fail to see exactly where they fit in the
# implementation of a lexer/parser. Some clarification would be most
# appreciated.


Don't confuse the implementation of a DFA with a DFA. A DFA is an
abstraction, sets of symbols and transitions. The DFA must be
implemented somehow. A frequent implementation is


for each input symbol do
case current state in
statename:
case current input in
symbol:
do something with the input
choose the next state
...
esac
...
esac
od


The easiest transformation from an RE is to a nondeterministic finite
automaton (NFA). The NFA can then be mechanically transformed into a
DFA. DFAs are easier to implement, but can have an exponentially
larger number of states than the NFA. In a DFA each state has only
one transition for a symbol; an NFA can multiple transitions, only one
of them is correct transition, but you don't know which one.


The only data structures you need for a DFA is the current state and
current input.


For a NFA you need to keep multiple current states to keep track of
every possible choice, or you need one state variable and a stack or
other mechanism to backtrack if you made the wrong choice.




NFA, DFA, RE, and type 3 grammars are all equivalent. NPDA
(nondeterministic PDA), CFG, and type 2 grammars are all
equivalent. One equivalent form can be mechanically converted into
another equivalent form.


Skipping over the technicalities the difference between a type 3 and
type 2 language is a type 2 language can enforce paired embedding
brackets like (((...{...}...))). A type 3 language cannot enforce
that kind of distant constraint.


That means in language like Pascal that allows nested comments
{...(*..{...}...*)...}, the comments cannot be described by an RE.


Most recent programming language are specifically defined in terms of
a type 2 and type 3 grammar. The type 3 grammar is used to create the
lexer and partitions the input character string into symbols used as
terminals of the type 2 grammar: keywords, identifiers, integer or
real constants, strings, operators, and other punctuation. The type 3
grammar deals with comments and symbol deliminations that cannot be
easily expressed in Backus-Naur form grammars.


They are usually implemented by feeding a string of input characters
into a lexer finite transducer defined by an RE or type 3 grammar
which outputs symbols that are fed into the CFG parser.


--
Derk Gwen http://derkgwen.250free.com/html/index.html
Don't say anything. Especially you.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.