Sat, 13 Feb 2010 20:06:01 +0000 (UTC)

Related articles |
---|

Limits of regular languages e0726905@student.tuwien.ac.at (e0726905) (2010-02-11) |

Re: Limits of regular languages ott@mirix.org (Matthias-Christian Ott) (2010-02-13) |

Re: Limits of regular languages kkylheku@gmail.com (Kaz Kylheku) (2010-02-13) |

Re: Limits of regular languages cfc@shell01.TheWorld.com (Chris F Clark) (2010-02-13) |

Re: Limits of regular languages max@gustavus.edu (Max Hailperin) (2010-02-13) |

Re: Limits of regular languages e0726905@student.tuwien.ac.at (e0726905) (2010-02-27) |

From: | Kaz Kylheku <kkylheku@gmail.com> |

Newsgroups: | comp.compilers |

Date: | Sat, 13 Feb 2010 20:06:01 +0000 (UTC) |

Organization: | A noiseless patient Spider |

References: | 10-02-052 |

Keywords: | lex, DFA |

Posted-Date: | 13 Feb 2010 19:31:20 EST |

On 2010-02-11, e0726905 <e0726905@student.tuwien.ac.at> wrote:

*> Hi,*

*>*

*> I have been playing around with regular expressions for the last few*

*> weeks, implementing the automaton generation algorithms from the dragon*

*> book and taking at the existing tools.*

*> Now a few questions have sprung, that I haven't been able to answer:*

*>*

*> Is there a way implement the usual longest-match behaviour of a scanner*

*> with a DFA without any backtracking mechanisms, meaning is that*

*> behaviour strictly regualr or not?*

You mean without recording the last good match, and reverting to that

piece of memory when reaching a state transition impasse?

Okay, first of all, it should be obvious that longest extraction

is not set membership testing!

The longest extraction scanner does something other than report ``does

this particular input string S lie within the language L''. Namely,

the scanner reports that ``a length N prefix of the input sequence

lies in the language L (and there is no such prefix longer than N)''.

Therefore, the scanner needs to be able to count, and it needs to be

able to count indefinitely high, since in general there is no upper

bound on N, which means that it's not mathematically a finite state

automaton.

(In practice, scanners limit token length; but so do parsers

limit recursion depth too.)

The longest-match behavior is achieved in practice by counting the

input, and at every acceptance state, recording a copy of the counter

as a ``longest match seen so far'' variable. When the matching

terminates (there is no more input, or the machine indicates that no

transitions are possible), the value of that variable is reported.

*> I also have two rather unrelated questions that have been bugging me for*

*> quite some time:*

*>*

*> Is every regular language LL(1)?*

Yes.

*> What class of language do regexes with backreferences fall into?*

*> Membership is NP-Complete so they are not in CFG (or are they? assuming*

*> P=NP?).*

In backreferences, prior input defines the grammar which matches new

input. So this is quite a different animal from the context free

grammar, which is purely a static structure, not influenced by input.

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.