Regular expression grammar?

Bruce Ediger <>
16 Sep 1999 01:56:12 -0400

          From comp.compilers

Related articles
Regular expression grammar? (Bruce Ediger) (1999-09-16)
Re: Regular expression grammar? (J.H.Jongejan) (1999-09-20)
Re: Regular expression grammar? (1999-09-20)
Re: Regular expression grammar? (1999-09-20)
Re: Regular expression grammar? (1999-09-20)
Re: Regular expression grammar? (Zalman Stern) (1999-09-24)
Re: Regular expression grammar? (Zalman Stern) (1999-09-24)
[2 later articles]
| List of all articles for this month |

From: Bruce Ediger <>
Newsgroups: comp.compilers
Date: 16 Sep 1999 01:56:12 -0400
Organization: Compilers Central
Keywords: parse, question

Dr Dobb's Journal, April 1999 carried an article by Brian Kernighan
and Rob Pike titled "Regular Expressions". I read the article, and
I was struck by the beginning sentences of the second-to-last paragraph:

      For regular expressions with parentheses and alternatives,
      the implementation must be more sophisticated. One approach
      is to compile the regular expression into a parse tree that
      captures its grammatical structure.

How the heck do they do that compilation? I've tried to come up with
a "yacc" grammar for regular expressions based on the definition of
"regular expression" in "Automata and Formal Languages" by Dean
Kelley, ISBN 0-13-497777-7, Prentice Hall, 1995.

Kelly defines regular languages like this (page 38):

      (a) The empty set is a regular language
      (b) The set containing the zero-length string is a regular language
      (c) The set containing every single character in the alphabet is regular
      (d) For two regular languages A and B, A | B, AB, A* are regular
      (e) No other languages are regular

His notation for regular expressions follows directly, and seems
pretty standard to me.

My "yacc" grammar for regular expressions:

        : SYMBOL /* any letter of the alphabet (c) */
        | regexp OR regexp /* alternation (d) */
        | regexp regexp /* concatenation (d) */
        | regexp STAR /* Kleene Closure (d) */
        | RPAREN regexp LPAREN /* grouping */
This gives a bunch of shift/reduce conflicts. I can't seem to get around
them by putting in operator precedences, or the analogy of algebraic
terms and factors.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.