Re: Precedence Rules for '$' and '^'

jamin.hanson@googlemail.com
Fri, 14 Sep 2007 00:43:16 -0700

          From comp.compilers

Related articles
Precedence Rules for '$' and '^' jamin.hanson@googlemail.com (2007-09-12)
Re: Precedence Rules for '$' and '^' jo@durchholz.org (Joachim Durchholz) (2007-09-13)
Re: Precedence Rules for '$' and '^' jo@durchholz.org (Joachim Durchholz) (2007-09-13)
Re: Precedence Rules for '$' and '^' jamin.hanson@googlemail.com (2007-09-14)
Re: Precedence Rules for '$' and '^' rsc@swtch.com (Russ Cox) (2007-09-14)
Re: Precedence Rules for '$' and '^' jo@durchholz.org (Joachim Durchholz) (2007-09-15)
Re: Precedence Rules for '$' and '^' cfc@shell01.TheWorld.com (Chris F Clark) (2007-09-17)
Re: Precedence Rules for '$' and '^' jamin.hanson@googlemail.com (2007-09-17)
| List of all articles for this month |

From: jamin.hanson@googlemail.com
Newsgroups: comp.compilers
Date: Fri, 14 Sep 2007 00:43:16 -0700
Organization: Compilers Central
References: 07-09-03507-09-037 07-09-048
Keywords: lex
Posted-Date: 15 Sep 2007 15:11:17 EDT

On 13 Sep, 22:06, Joachim Durchholz <j...@durchholz.org> wrote:
> John Levine wrote:
> > [I don't understand it either. My understanding of typical REs is
> > that they special case ^ at the beginning of a pattern or chunk
> > that could match at the beginning, and $ at the end. -John]
>
> That's just implementation; the OP was after precedences.
>
> Assigning a precedence to ^ and $ does make sense. For example,
>
> ^asd|jkl
>
> could mean "asd at the beginning of the text, or jkl anywhere in the
> text", or it could mean "asd or jkl at the beginning of the text".
> PCRE says it's either ^asd or jkl, so it assigns a higher precedence to
> ^ than to |.


The way I deal with ^ or $ being coincidental is to include all other
possible inputs in the state following a match of ^ or $. To use your
example:


State 0:
    ^ -> State 1
    j -> State 2


State 1:
    a -> State 3
    j -> State 2


State 2:


    k -> State 4


State 3:
    s -> State 5


State 4:


    l -> State 6


State 5:
    d -> State 6


State 6
    (end state)


The reasoning is that you can't get to match 'a' without matching '^',
but you can match 'j' ieven if you match '^', as the '^' is irrelevant
to the 'j' path. I'd be interested to hear what people think about
this approach.


However, the real question is that if you allow '^' and '$' to occur
anywhere in a regex (boost::regex works this way), how you handle '^'
and '$' clashes, because you may have declared a '$' rule before a '^'
rule, yet my code always checks '^' before '$' regardless. As you
have to check both possibilities on lookup (otherwise how can you ever
match them ;-) ), the right thing to do appears to be to suppress the
'^' if it occurs at a position in the rules that a '$' has already
occurred at.


Note that using PERL rules is not the answer, as lexers use left-most
longest and compile to a DFA. PERL uses leftmost precedence and uses
NFA.


Regards,


Ben


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.