lexical analysis question

dreder1ck@hotmail.com (Drederick)
30 Mar 2003 00:55:44 -0500

          From comp.compilers

Related articles
lexical analysis question dreder1ck@hotmail.com (2003-03-30)
Re: lexical analysis question matkin@acm.org (Mats Kindahl) (2003-03-30)
Re: lexical analysis question cfc@world.std.com (Chris F Clark) (2003-03-30)
Re: lexical analysis question dreder1ck@hotmail.com (2003-04-05)
| List of all articles for this month |

From: dreder1ck@hotmail.com (Drederick)
Newsgroups: comp.compilers
Date: 30 Mar 2003 00:55:44 -0500
Organization: http://groups.google.com/
Keywords: lex, question
Posted-Date: 30 Mar 2003 00:55:44 EST

Hi, I'm trying to figure out a lexical analysis problem. I have a few
questions about it. (I think I asked on the wrong compilers newsgorup

The problem: Develop an algorithm for lexical analysis (similar to
what lex does), where the input is a set of regular expressions, a
constant C, and a stream of symbols. The output is some string
signifying which regular expressions were matched, in what order.
However, this algorithm can only use a constant lookahead C, which in
this particular case means that if s[1...infinity] is the input
stream, you are not allowed to look at symbol s[i+C] or any later
symbol until you have already matched symbol s[i] to some regular
expression. Come up with the fastest algorithm possible for this.

(obviously this algorithm would not be able to handle arbitrary
regular expressions, due to the fixed lookahead. It only needs to work
for sets of regular expressions where ambiguity cannot arise)

My questions:

(1) Has this problem been studied extensively?

(2) Is the use if 'fixed lookahead' as I describe it above common? Or
does 'fixed lookahead' usually have some other meaning?

(3) If I want to get some ideas for this problem, what resources do
you suggest that I look at? I have taken a class in finite automata,
so I'm somewhat familiar with regular expressions and DFAs. I think I
understand very abstractly algorithm behind lex -- setting up the NFA
based on the regular expressions, converting this to a DFA, and then
trying to minimize this DFA. But, this doesn't really utilize the
constant C..

I am stuck on this problem -- does anyone have advice about what
avenues to persue?

[Unless you use the fairly disreputable trailing context feature, lex
doesn't look ahead at all. It makes a DFA which matches the union of
all of the input patterns, and keeps finding the longest match of the
input string. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.