Related articles |
---|
Regular expression grammar? bediger@teal.csn.net (Bruce Ediger) (1999-09-16) |
Re: Regular expression grammar? jjan@cs.rug.nl (J.H.Jongejan) (1999-09-20) |
Re: Regular expression grammar? terryg@uswest.net (1999-09-20) |
Re: Regular expression grammar? lex@cc.gatech.edu (1999-09-20) |
Re: Regular expression grammar? cbarron3@ix.netcom.com (1999-09-20) |
Re: Regular expression grammar? zalman@netcom18.netcom.com (Zalman Stern) (1999-09-24) |
Re: Regular expression grammar? zalman@netcom18.netcom.com (Zalman Stern) (1999-09-24) |
[2 later articles] |
From: | Bruce Ediger <bediger@teal.csn.net> |
Newsgroups: | comp.compilers |
Date: | 16 Sep 1999 01:56:12 -0400 |
Organization: | Compilers Central |
Keywords: | parse, question |
Dr Dobb's Journal, April 1999 carried an article by Brian Kernighan
and Rob Pike titled "Regular Expressions". I read the article, and
I was struck by the beginning sentences of the second-to-last paragraph:
For regular expressions with parentheses and alternatives,
the implementation must be more sophisticated. One approach
is to compile the regular expression into a parse tree that
captures its grammatical structure.
How the heck do they do that compilation? I've tried to come up with
a "yacc" grammar for regular expressions based on the definition of
"regular expression" in "Automata and Formal Languages" by Dean
Kelley, ISBN 0-13-497777-7, Prentice Hall, 1995.
Kelly defines regular languages like this (page 38):
(a) The empty set is a regular language
(b) The set containing the zero-length string is a regular language
(c) The set containing every single character in the alphabet is regular
(d) For two regular languages A and B, A | B, AB, A* are regular
(e) No other languages are regular
His notation for regular expressions follows directly, and seems
pretty standard to me.
My "yacc" grammar for regular expressions:
--------
%token SYMBOL OR STAR LPAREN RPAREN
%%
regexp
: SYMBOL /* any letter of the alphabet (c) */
| regexp OR regexp /* alternation (d) */
| regexp regexp /* concatenation (d) */
| regexp STAR /* Kleene Closure (d) */
| RPAREN regexp LPAREN /* grouping */
;
%%
--------
This gives a bunch of shift/reduce conflicts. I can't seem to get around
them by putting in operator precedences, or the analogy of algebraic
terms and factors.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.