5 Apr 2003 15:40:53 -0500

Related articles |
---|

lexical analysis question dreder1ck@hotmail.com (2003-03-30) |

Re: lexical analysis question matkin@acm.org (Mats Kindahl) (2003-03-30) |

Re: lexical analysis question cfc@world.std.com (Chris F Clark) (2003-03-30) |

Re: lexical analysis question dreder1ck@hotmail.com (2003-04-05) |

From: | dreder1ck@hotmail.com (Drederick) |

Newsgroups: | comp.compilers,comp.theory |

Date: | 5 Apr 2003 15:40:53 -0500 |

Organization: | http://groups.google.com/ |

References: | 03-03-178 03-03-191 |

Keywords: | lex |

Posted-Date: | 05 Apr 2003 15:40:53 EST |

Chris F Clark <cfc@world.std.com> wrote in message news:03-03-191...

*> I am hoping this is not a homework problem. It is just contrived*

*> enough to be a homework problem in an automata class and represent a*

*> special class of machines that I'm not aware of. However, it is also*

*> possible you are trying to solve an "interesting" problem.*

Well, it is not a 'normal' homework problem. I asked my professor if

he would think of an interesting problem for me and this was the

result. I will probably end up doing some sort of project related to

it, unless I discover that it isn't interesting. (hence why my

original message was not soliciting solutions).

*> Does it also mean that no string matching E[i] is a prefix of a string*

*> matching E[j] (and vice versa)? Is it possible for a prefix of string*

*> matching E[j] to match a suffix of a string matching E[i]?*

The situation in your first sentence is possible. The algorithm always

matches the longest string possible which matches some regular

expression (starting from the beginning of the yet unmatched symbols).

*> Next, can your stream of symbols contain errors, sequences of symbols*

*> that match no regular expression or is it error-free?*

The sequence is error-free.

*> If the regular expressions, while unambiguous, have prefixes that can*

*> match suffixes of other strings (an "overlapping ambiguity"), your*

*> constant bound of C becomes relevant in that one cannot distinuguish*

*> "overlap ambiguities" that are longer than C.*

Indeed.

*> E[0]: a b+;*

*> E[1]: b c;*

*>*

*> s = a b b c*

*>*

*> correct results: E[0]->a b, E[1]->b c*

*> longest match results: E[0]->a b b, error no match for "c"*

OK. The algorithm I envision is supposed to greedily match E[0] and

then give the error.

*> You can even find sets of regular expressions where 3 or more of the*

*> rules interact to determine the end of the first regular expression.*

*>*

*> E[0]: a b+;*

*> E[1]: b c;*

*> E[2]: c c d;*

*>*

*> s = a b b c c d => E[0]->a b b, E[2]->c c d*

*> s = a b b c c c d => E[0]->a b, E[1]->b c, E[2]->c c d*

*>*

*> In this case, you don't know when to match E[0] until you have seen*

*> not only a "c", but as many as 3 characters following its end to see*

*> if the sequence is "c c d".*

Hmm, that general problem does look nasty.

*> In case this is a homework problem, can you show that for any given*

*> set E, how many lookahead symbols are required to determine the end of*

*> any given regular expression?*

I think I see what you mean.. my attempt is below

*> Is there an algorithm for computing it?*

*> Or, is there a counter-example which requires unbounded lookahead*

*> beyond the end of a regular expression to determine when the regular*

*> expression is matched (and is not part of the overlap with some other*

*> regular expression)?*

It seems like a counterexample is:

E[0] = ab+

E[1] = bc+d

E[2] = bbc+e

input: abbbcccccccc<arbitrarily many c symbols>cccd

The algorithm wouldn't know whether to match ab or abb until it found

the letter after the last c symbol, and we cant put a limit on the #

of c symbols. Was that the right interpretation of your question?

[snip]

*> Hope this helps,*

*> -Chris*

Yes, it was very helpful. Thank you for the detailed response. Do you

think most compiler textbooks would give me the background needed to

solve this problem? Are there any particular books or references that

you think would be good, so I don't have to spend time re-inventing

theory that is not very unique to this problem?

-Drederick

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.