Re: Precedence Rules for '$' and '^'

jamin.hanson@googlemail.com
Fri, 14 Sep 2007 00:43:16 -0700

          From comp.compilers

Related articles
Precedence Rules for '$' and '^' jamin.hanson@googlemail.com (2007-09-12)
Re: Precedence Rules for '$' and '^' jo@durchholz.org (Joachim Durchholz) (2007-09-13)
Re: Precedence Rules for '$' and '^' jo@durchholz.org (Joachim Durchholz) (2007-09-13)
Re: Precedence Rules for '$' and '^' jamin.hanson@googlemail.com (2007-09-14)
Re: Precedence Rules for '$' and '^' rsc@swtch.com (Russ Cox) (2007-09-14)
Re: Precedence Rules for '$' and '^' jo@durchholz.org (Joachim Durchholz) (2007-09-15)
Re: Precedence Rules for '$' and '^' cfc@shell01.TheWorld.com (Chris F Clark) (2007-09-17)
Re: Precedence Rules for '$' and '^' jamin.hanson@googlemail.com (2007-09-17)
| List of all articles for this month |

From: jamin.hanson@googlemail.com
Newsgroups: comp.compilers
Date: Fri, 14 Sep 2007 00:43:16 -0700
Organization: Compilers Central
References: 07-09-03507-09-037 07-09-048
Keywords: lex
Posted-Date: 15 Sep 2007 15:11:17 EDT

On 13 Sep, 22:06, Joachim Durchholz <j...@durchholz.org> wrote:
> John Levine wrote:
> > [I don't understand it either. My understanding of typical REs is
> > that they special case ^ at the beginning of a pattern or chunk
> > that could match at the beginning, and $ at the end. -John]
>
> That's just implementation; the OP was after precedences.
>
> Assigning a precedence to ^ and $ does make sense. For example,
>
> ^asd|jkl
>
> could mean "asd at the beginning of the text, or jkl anywhere in the
> text", or it could mean "asd or jkl at the beginning of the text".
> PCRE says it's either ^asd or jkl, so it assigns a higher precedence to
> ^ than to |.

The way I deal with ^ or $ being coincidental is to include all other
possible inputs in the state following a match of ^ or $. To use your
example:

State 0:
    ^ -> State 1
    j -> State 2

State 1:
    a -> State 3
    j -> State 2

State 2:

    k -> State 4

State 3:
    s -> State 5

State 4:

    l -> State 6

State 5:
    d -> State 6

State 6
    (end state)

The reasoning is that you can't get to match 'a' without matching '^',
but you can match 'j' ieven if you match '^', as the '^' is irrelevant
to the 'j' path. I'd be interested to hear what people think about
this approach.

However, the real question is that if you allow '^' and '$' to occur
anywhere in a regex (boost::regex works this way), how you handle '^'
and '$' clashes, because you may have declared a '$' rule before a '^'
rule, yet my code always checks '^' before '$' regardless. As you
have to check both possibilities on lookup (otherwise how can you ever
match them ;-) ), the right thing to do appears to be to suppress the
'^' if it occurs at a position in the rules that a '$' has already
occurred at.

Note that using PERL rules is not the answer, as lexers use left-most
longest and compile to a DFA. PERL uses leftmost precedence and uses
NFA.

Regards,

Ben


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.