Related articles |
---|
Precedence Rules for '$' and '^' jamin.hanson@googlemail.com (2007-09-12) |
Re: Precedence Rules for '$' and '^' jo@durchholz.org (Joachim Durchholz) (2007-09-13) |
Re: Precedence Rules for '$' and '^' jo@durchholz.org (Joachim Durchholz) (2007-09-13) |
Re: Precedence Rules for '$' and '^' jamin.hanson@googlemail.com (2007-09-14) |
Re: Precedence Rules for '$' and '^' rsc@swtch.com (Russ Cox) (2007-09-14) |
Re: Precedence Rules for '$' and '^' jo@durchholz.org (Joachim Durchholz) (2007-09-15) |
Re: Precedence Rules for '$' and '^' cfc@shell01.TheWorld.com (Chris F Clark) (2007-09-17) |
Re: Precedence Rules for '$' and '^' jamin.hanson@googlemail.com (2007-09-17) |
From: | jamin.hanson@googlemail.com |
Newsgroups: | comp.compilers |
Date: | Mon, 17 Sep 2007 00:46:54 -0700 |
Organization: | Compilers Central |
References: | 07-09-03507-09-048 07-09-052 07-09-061 |
Keywords: | lex |
Posted-Date: | 18 Sep 2007 08:10:16 EDT |
On 15 Sep, 21:41, Joachim Durchholz <j...@durchholz.org> wrote:
> jamin.han...@googlemail.com schrieb:
>
> > However, the real question is that if you allow '^' and '$' to occur
> > anywhere in a regex (boost::regex works this way),
>
> I may be missing something, but it seems to me that such a rule
> wouldn't match anything if it has a nonempty pattern before the ^ or
> after the $.
>
> I.e. asd^jkl, while a perfectly valid regex, will never match, or will it?
No that regex won't ever match.
> > how you handle '^'
>
> > and '$' clashes, because you may have declared a '$' rule before a '^'
> > rule, yet my code always checks '^' before '$' regardless.
>
> Some example would help.
I will have to hunt around for some good examples! Do you know of any
example lex specs what actually use ^ and $ at all?
>Rule order definitely doesn't affect lexing,
> at least not unless you provide mechanisms that go beyond regular
> expressions.
Yes they do! Rule ordering is used to determine which rule matched in
the event of ambiguity. That's why the more general rules such as [a-
z]+ (for example) come AFTER keywords in lex specs.
> > As you have to check both possibilities on lookup (otherwise how can
> > you ever match them ;-) ), the right thing to do appears to be to
> > suppress the '^' if it occurs at a position in the rules that a '$'
> > has already occurred at.
>
> In my book, the Right Thing would be to spit out a warning somewhere.
As I mentioned, boost::regex deals with ^ and $ anywhere in a regex
without complaining.
> > Note that using PERL rules is not the answer, as lexers use left-most
> > longest and compile to a DFA. PERL uses leftmost precedence and uses
> > NFA.
>
> I'm not sure how "leftmost longest" and "leftmost precedence" relate.
Here's the difference: [-+]?(\d+[.]?|\d*[.]\d+) in PERL will match 0.
when presented with 0.1
This is because it gives up as soon as the leftmost rule matches.
This is in contrast with a leftmost longest strategy (typically used
with a DFA representation) which will keep going until it can't match
anymore regardless of the ordering of the ored expressions.
> DFA and NFA are certainly not at odds, since they are equivalent.
In theory, but not the way PERL implements lookup.
> I'd still look at Perl whenever I'm unsure, simply because that's how
> most people expect regexes to work. That doesn't mean you have to do
> everything as Perl does, particularly if you have good reasons to do
> otherwise :-)
PERL is the wrong model for lexers. Extended POSIX is closer.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.