10 Aug 2003 10:52:16 -0400

Related articles |
---|

DFAs,NFAs,REs,CFGs,PDAs ARGH! tony@transCendenZ.co.uk (Anthony Webster) (2003-08-04) |

Re: DFAs,NFAs,REs,CFGs,PDAs ARGH! p2sam@uwaterloo.ca (2003-08-10) |

Re: DFAs,NFAs,REs,CFGs,PDAs ARGH! usenet0@skora.net (Thomas Skora) (2003-08-10) |

Re: DFAs,NFAs,REs,CFGs,PDAs ARGH! derkgwen@HotPOP.com (Derk Gwen) (2003-08-10) |

Re: DFAs,NFAs,REs,CFGs,PDAs ARGH! joachim.durchholz@web.de (Joachim Durchholz) (2003-08-23) |

From: | Derk Gwen <derkgwen@HotPOP.com> |

Newsgroups: | comp.compilers |

Date: | 10 Aug 2003 10:52:16 -0400 |

Organization: | Quick STOP Groceries |

References: | 03-08-015 |

Keywords: | lex, DFA |

Posted-Date: | 10 Aug 2003 10:52:16 EDT |

"Anthony Webster" <tony@transCendenZ.co.uk> wrote:

# Hi,

#

# I'm writing the lexer and parser parts of a compiler. I've got a basic

# lexer working but it relies on a massive while statement and I was

# thinking of using something more intesting like a DFA built from a

# RE. I'm still a little confused as to which data structure to use for

# what part of the system. I understand the mechanics of things like

# CFGs and PDAs but fail to see exactly where they fit in the

# implementation of a lexer/parser. Some clarification would be most

# appreciated.

Don't confuse the implementation of a DFA with a DFA. A DFA is an

abstraction, sets of symbols and transitions. The DFA must be

implemented somehow. A frequent implementation is

for each input symbol do

case current state in

statename:

case current input in

symbol:

do something with the input

choose the next state

...

esac

...

esac

od

The easiest transformation from an RE is to a nondeterministic finite

automaton (NFA). The NFA can then be mechanically transformed into a

DFA. DFAs are easier to implement, but can have an exponentially

larger number of states than the NFA. In a DFA each state has only

one transition for a symbol; an NFA can multiple transitions, only one

of them is correct transition, but you don't know which one.

The only data structures you need for a DFA is the current state and

current input.

For a NFA you need to keep multiple current states to keep track of

every possible choice, or you need one state variable and a stack or

other mechanism to backtrack if you made the wrong choice.

NFA, DFA, RE, and type 3 grammars are all equivalent. NPDA

(nondeterministic PDA), CFG, and type 2 grammars are all

equivalent. One equivalent form can be mechanically converted into

another equivalent form.

Skipping over the technicalities the difference between a type 3 and

type 2 language is a type 2 language can enforce paired embedding

brackets like (((...{...}...))). A type 3 language cannot enforce

that kind of distant constraint.

That means in language like Pascal that allows nested comments

{...(*..{...}...*)...}, the comments cannot be described by an RE.

Most recent programming language are specifically defined in terms of

a type 2 and type 3 grammar. The type 3 grammar is used to create the

lexer and partitions the input character string into symbols used as

terminals of the type 2 grammar: keywords, identifiers, integer or

real constants, strings, operators, and other punctuation. The type 3

grammar deals with comments and symbol deliminations that cannot be

easily expressed in Backus-Naur form grammars.

They are usually implemented by feeding a string of input characters

into a lexer finite transducer defined by an RE or type 3 grammar

which outputs symbols that are fed into the CFG parser.

--

Derk Gwen http://derkgwen.250free.com/html/index.html

Don't say anything. Especially you.

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.