Lexing overlapping patterns

pollarda@hawaii.edu (Art Pollard)
14 Jan 1998 15:24:56 -0500

          From comp.compilers

Related articles
Lexing overlapping patterns pollarda@hawaii.edu (1998-01-14)
| List of all articles for this month |
From: pollarda@hawaii.edu (Art Pollard)
Newsgroups: comp.compilers
Date: 14 Jan 1998 15:24:56 -0500
Organization: University of Hawaii
Keywords: lex, question

Hi.


I am trying to figure out a way to match a series of overlapping patterns
using lex. (Yes, I didn't think this would be easy)


So for example, let us take an example from RTF there might be a sequence
such as this:


1234/bold56789


Where this should be matched as:


123456789
and
/bold


or the other way around.


Of course, the patterns being matched need to be passed on in a way that
they are related to each other. (By proximity or other such association.)
So this rules out lexing it twice in two separate passes.


If anyone has ever wanted to do such a thing (besides me) and found a way
around this limitation, I would love to hear from you.


-Art
--
Art Pollard <PollardA@Hawaii.edu>
Moderator for Comp.Theory.Info-Retrieval
List Maintainer for the Hyper-Theory (Hypertext Theory) mailing list.
[Actually, it's not all that hard. Lex provides trailing context patterns
like "foo/bar" which means match foo only if bar follows, but leave bar for
a subsequent match. Also you can use yyless() in action code to give back
trailing parts of the token you just matched. For leading context, use %x
exclusive start states. Don't really understand your RTF example, that
would be 1234 followed by a /bold keyword with a numeric argument of 56789.
RTF is easy to lex, it's making sense of the 1.5 billion poorly defined
tokens that's the problem. -John]






--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.