Related articles |
---|
Precedence Rules for '$' and '^' jamin.hanson@googlemail.com (2007-09-12) |
Re: Precedence Rules for '$' and '^' jo@durchholz.org (Joachim Durchholz) (2007-09-13) |
Re: Precedence Rules for '$' and '^' jo@durchholz.org (Joachim Durchholz) (2007-09-13) |
Re: Precedence Rules for '$' and '^' jamin.hanson@googlemail.com (2007-09-14) |
Re: Precedence Rules for '$' and '^' rsc@swtch.com (Russ Cox) (2007-09-14) |
Re: Precedence Rules for '$' and '^' jo@durchholz.org (Joachim Durchholz) (2007-09-15) |
Re: Precedence Rules for '$' and '^' cfc@shell01.TheWorld.com (Chris F Clark) (2007-09-17) |
Re: Precedence Rules for '$' and '^' jamin.hanson@googlemail.com (2007-09-17) |
From: | jamin.hanson@googlemail.com |
Newsgroups: | comp.compilers |
Date: | Fri, 14 Sep 2007 00:43:16 -0700 |
Organization: | Compilers Central |
References: | 07-09-03507-09-037 07-09-048 |
Keywords: | lex |
Posted-Date: | 15 Sep 2007 15:11:17 EDT |
On 13 Sep, 22:06, Joachim Durchholz <j...@durchholz.org> wrote:
> John Levine wrote:
> > [I don't understand it either. My understanding of typical REs is
> > that they special case ^ at the beginning of a pattern or chunk
> > that could match at the beginning, and $ at the end. -John]
>
> That's just implementation; the OP was after precedences.
>
> Assigning a precedence to ^ and $ does make sense. For example,
>
> ^asd|jkl
>
> could mean "asd at the beginning of the text, or jkl anywhere in the
> text", or it could mean "asd or jkl at the beginning of the text".
> PCRE says it's either ^asd or jkl, so it assigns a higher precedence to
> ^ than to |.
The way I deal with ^ or $ being coincidental is to include all other
possible inputs in the state following a match of ^ or $. To use your
example:
State 0:
^ -> State 1
j -> State 2
State 1:
a -> State 3
j -> State 2
State 2:
k -> State 4
State 3:
s -> State 5
State 4:
l -> State 6
State 5:
d -> State 6
State 6
(end state)
The reasoning is that you can't get to match 'a' without matching '^',
but you can match 'j' ieven if you match '^', as the '^' is irrelevant
to the 'j' path. I'd be interested to hear what people think about
this approach.
However, the real question is that if you allow '^' and '$' to occur
anywhere in a regex (boost::regex works this way), how you handle '^'
and '$' clashes, because you may have declared a '$' rule before a '^'
rule, yet my code always checks '^' before '$' regardless. As you
have to check both possibilities on lookup (otherwise how can you ever
match them ;-) ), the right thing to do appears to be to suppress the
'^' if it occurs at a position in the rules that a '$' has already
occurred at.
Note that using PERL rules is not the answer, as lexers use left-most
longest and compile to a DFA. PERL uses leftmost precedence and uses
NFA.
Regards,
Ben
Return to the
comp.compilers page.
Search the
comp.compilers archives again.