Re: Regular expression grammar?

lex@cc.gatech.edu
20 Sep 1999 12:00:25 -0400

          From comp.compilers

Related articles
Regular expression grammar? bediger@teal.csn.net (Bruce Ediger) (1999-09-16)
Re: Regular expression grammar? jjan@cs.rug.nl (J.H.Jongejan) (1999-09-20)
Re: Regular expression grammar? terryg@uswest.net (1999-09-20)
Re: Regular expression grammar? lex@cc.gatech.edu (1999-09-20)
Re: Regular expression grammar? cbarron3@ix.netcom.com (1999-09-20)
Re: Regular expression grammar? zalman@netcom18.netcom.com (Zalman Stern) (1999-09-24)
Re: Regular expression grammar? zalman@netcom18.netcom.com (Zalman Stern) (1999-09-24)
Re: Regular expression grammar? zalman@netcom18.netcom.com (Zalman Stern) (1999-09-24)
Re: Regular expression grammar? cbarron3@ix.netcom.com (1999-09-27)
| List of all articles for this month |
From: lex@cc.gatech.edu
Newsgroups: comp.compilers
Date: 20 Sep 1999 12:00:25 -0400
Organization: Georgia Institute of Technology, Atlanta GA, USA
References: 99-09-051
Keywords: lex

Bruce Ediger <bediger@teal.csn.net> writes:


[ grammers for regular expressions ]


> --------
> %token SYMBOL OR STAR LPAREN RPAREN
> %%
> regexp
> : SYMBOL /* any letter of the alphabet (c) */
> | regexp OR regexp /* alternation (d) */
> | regexp regexp /* concatenation (d) */
> | regexp STAR /* Kleene Closure (d) */
> | RPAREN regexp LPAREN /* grouping */
> ;
> %%
> --------
> This gives a bunch of shift/reduce conflicts. I can't seem to get around
> them by putting in operator precedences, or the analogy of algebraic
> terms and factors.




The simplest approach would probably be to play with the precedence
and associativity features that bison and perhaps yacc have. However,
you can do it with a straight BNF description, as well. To this end,
your idea of algebraic terms and factors actually seems to be the key.




For regular expressions, the operators are *, |, (), and
concatenation. Their precendense goes, I think, like this:


|
concatenation
*
()




(I might have | and concatenation swapped -- it's easy to tweak if
this is wrong). Assign the following precedence levels:


| 1
conc. 2
* 3
() 4 ("primary")




Let "r3" be an expression with only level 3 and higher operators, "r2"
be one with only level 2 and higher, etc. Then the following grammer
produces no conflicts:




%token SYMBOL OR STAR LPAREN RPAREN
%%
regexp : r1 ;


r1 : r2 | r1 OR r2 ;
r2 : r3 | r2 r3 ;
r3 : primary | r3 STAR ;


primary : parenthesized | SYMBOL ;




parenthesized : LPAREN regexp RPAREN ;
%%






-Lex Spoon


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.