30 Mar 2003 00:55:44 -0500

Related articles |
---|

lexical analysis question dreder1ck@hotmail.com (2003-03-30) |

Re: lexical analysis question matkin@acm.org (Mats Kindahl) (2003-03-30) |

Re: lexical analysis question cfc@world.std.com (Chris F Clark) (2003-03-30) |

Re: lexical analysis question dreder1ck@hotmail.com (2003-04-05) |

From: | dreder1ck@hotmail.com (Drederick) |

Newsgroups: | comp.compilers |

Date: | 30 Mar 2003 00:55:44 -0500 |

Organization: | http://groups.google.com/ |

Keywords: | lex, question |

Posted-Date: | 30 Mar 2003 00:55:44 EST |

Hi, I'm trying to figure out a lexical analysis problem. I have a few

questions about it. (I think I asked on the wrong compilers newsgorup

before).

The problem: Develop an algorithm for lexical analysis (similar to

what lex does), where the input is a set of regular expressions, a

constant C, and a stream of symbols. The output is some string

signifying which regular expressions were matched, in what order.

However, this algorithm can only use a constant lookahead C, which in

this particular case means that if s[1...infinity] is the input

stream, you are not allowed to look at symbol s[i+C] or any later

symbol until you have already matched symbol s[i] to some regular

expression. Come up with the fastest algorithm possible for this.

(obviously this algorithm would not be able to handle arbitrary

regular expressions, due to the fixed lookahead. It only needs to work

for sets of regular expressions where ambiguity cannot arise)

My questions:

(1) Has this problem been studied extensively?

(2) Is the use if 'fixed lookahead' as I describe it above common? Or

does 'fixed lookahead' usually have some other meaning?

(3) If I want to get some ideas for this problem, what resources do

you suggest that I look at? I have taken a class in finite automata,

so I'm somewhat familiar with regular expressions and DFAs. I think I

understand very abstractly algorithm behind lex -- setting up the NFA

based on the regular expressions, converting this to a DFA, and then

trying to minimize this DFA. But, this doesn't really utilize the

constant C..

I am stuck on this problem -- does anyone have advice about what

avenues to persue?

-Drederick

[Unless you use the fairly disreputable trailing context feature, lex

doesn't look ahead at all. It makes a DFA which matches the union of

all of the input patterns, and keeps finding the longest match of the

input string. -John]

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.