|Lexing overlapping patterns email@example.com (1998-01-14)|
|From:||firstname.lastname@example.org (Art Pollard)|
|Date:||14 Jan 1998 15:24:56 -0500|
|Organization:||University of Hawaii|
I am trying to figure out a way to match a series of overlapping patterns
using lex. (Yes, I didn't think this would be easy)
So for example, let us take an example from RTF there might be a sequence
such as this:
Where this should be matched as:
or the other way around.
Of course, the patterns being matched need to be passed on in a way that
they are related to each other. (By proximity or other such association.)
So this rules out lexing it twice in two separate passes.
If anyone has ever wanted to do such a thing (besides me) and found a way
around this limitation, I would love to hear from you.
Art Pollard <PollardA@Hawaii.edu>
Moderator for Comp.Theory.Info-Retrieval
List Maintainer for the Hyper-Theory (Hypertext Theory) mailing list.
[Actually, it's not all that hard. Lex provides trailing context patterns
like "foo/bar" which means match foo only if bar follows, but leave bar for
a subsequent match. Also you can use yyless() in action code to give back
trailing parts of the token you just matched. For leading context, use %x
exclusive start states. Don't really understand your RTF example, that
would be 1234 followed by a /bold keyword with a numeric argument of 56789.
RTF is easy to lex, it's making sense of the 1.5 billion poorly defined
tokens that's the problem. -John]
Return to the
Search the comp.compilers archives again.