Derk Gwen <>
10 Aug 2003 10:52:16 -0400

          From comp.compilers

Related articles
DFAs,NFAs,REs,CFGs,PDAs ARGH! (Anthony Webster) (2003-08-04)
Re: DFAs,NFAs,REs,CFGs,PDAs ARGH! (2003-08-10)
Re: DFAs,NFAs,REs,CFGs,PDAs ARGH! (Thomas Skora) (2003-08-10)
Re: DFAs,NFAs,REs,CFGs,PDAs ARGH! (Derk Gwen) (2003-08-10)
Re: DFAs,NFAs,REs,CFGs,PDAs ARGH! (Joachim Durchholz) (2003-08-23)
| List of all articles for this month |

From: Derk Gwen <>
Newsgroups: comp.compilers
Date: 10 Aug 2003 10:52:16 -0400
Organization: Quick STOP Groceries
References: 03-08-015
Keywords: lex, DFA
Posted-Date: 10 Aug 2003 10:52:16 EDT

"Anthony Webster" <> wrote:
# Hi,
# I'm writing the lexer and parser parts of a compiler. I've got a basic
# lexer working but it relies on a massive while statement and I was
# thinking of using something more intesting like a DFA built from a
# RE. I'm still a little confused as to which data structure to use for
# what part of the system. I understand the mechanics of things like
# CFGs and PDAs but fail to see exactly where they fit in the
# implementation of a lexer/parser. Some clarification would be most
# appreciated.

Don't confuse the implementation of a DFA with a DFA. A DFA is an
abstraction, sets of symbols and transitions. The DFA must be
implemented somehow. A frequent implementation is

for each input symbol do
case current state in
case current input in
do something with the input
choose the next state

The easiest transformation from an RE is to a nondeterministic finite
automaton (NFA). The NFA can then be mechanically transformed into a
DFA. DFAs are easier to implement, but can have an exponentially
larger number of states than the NFA. In a DFA each state has only
one transition for a symbol; an NFA can multiple transitions, only one
of them is correct transition, but you don't know which one.

The only data structures you need for a DFA is the current state and
current input.

For a NFA you need to keep multiple current states to keep track of
every possible choice, or you need one state variable and a stack or
other mechanism to backtrack if you made the wrong choice.

NFA, DFA, RE, and type 3 grammars are all equivalent. NPDA
(nondeterministic PDA), CFG, and type 2 grammars are all
equivalent. One equivalent form can be mechanically converted into
another equivalent form.

Skipping over the technicalities the difference between a type 3 and
type 2 language is a type 2 language can enforce paired embedding
brackets like (((...{...}...))). A type 3 language cannot enforce
that kind of distant constraint.

That means in language like Pascal that allows nested comments
{...(*..{...}...*)...}, the comments cannot be described by an RE.

Most recent programming language are specifically defined in terms of
a type 2 and type 3 grammar. The type 3 grammar is used to create the
lexer and partitions the input character string into symbols used as
terminals of the type 2 grammar: keywords, identifiers, integer or
real constants, strings, operators, and other punctuation. The type 3
grammar deals with comments and symbol deliminations that cannot be
easily expressed in Backus-Naur form grammars.

They are usually implemented by feeding a string of input characters
into a lexer finite transducer defined by an RE or type 3 grammar
which outputs symbols that are fed into the CFG parser.

Derk Gwen
Don't say anything. Especially you.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.