16 Sep 1999 01:56:12 -0400

Related articles |
---|

Regular expression grammar? bediger@teal.csn.net (Bruce Ediger) (1999-09-16) |

Re: Regular expression grammar? jjan@cs.rug.nl (J.H.Jongejan) (1999-09-20) |

Re: Regular expression grammar? terryg@uswest.net (1999-09-20) |

Re: Regular expression grammar? lex@cc.gatech.edu (1999-09-20) |

Re: Regular expression grammar? cbarron3@ix.netcom.com (1999-09-20) |

Re: Regular expression grammar? zalman@netcom18.netcom.com (Zalman Stern) (1999-09-24) |

Re: Regular expression grammar? zalman@netcom18.netcom.com (Zalman Stern) (1999-09-24) |

[2 later articles] |

From: | Bruce Ediger <bediger@teal.csn.net> |

Newsgroups: | comp.compilers |

Date: | 16 Sep 1999 01:56:12 -0400 |

Organization: | Compilers Central |

Keywords: | parse, question |

Dr Dobb's Journal, April 1999 carried an article by Brian Kernighan

and Rob Pike titled "Regular Expressions". I read the article, and

I was struck by the beginning sentences of the second-to-last paragraph:

For regular expressions with parentheses and alternatives,

the implementation must be more sophisticated. One approach

is to compile the regular expression into a parse tree that

captures its grammatical structure.

How the heck do they do that compilation? I've tried to come up with

a "yacc" grammar for regular expressions based on the definition of

"regular expression" in "Automata and Formal Languages" by Dean

Kelley, ISBN 0-13-497777-7, Prentice Hall, 1995.

Kelly defines regular languages like this (page 38):

(a) The empty set is a regular language

(b) The set containing the zero-length string is a regular language

(c) The set containing every single character in the alphabet is regular

(d) For two regular languages A and B, A | B, AB, A* are regular

(e) No other languages are regular

His notation for regular expressions follows directly, and seems

pretty standard to me.

My "yacc" grammar for regular expressions:

--------

%token SYMBOL OR STAR LPAREN RPAREN

%%

regexp

: SYMBOL /* any letter of the alphabet (c) */

| regexp OR regexp /* alternation (d) */

| regexp regexp /* concatenation (d) */

| regexp STAR /* Kleene Closure (d) */

| RPAREN regexp LPAREN /* grouping */

;

%%

--------

This gives a bunch of shift/reduce conflicts. I can't seem to get around

them by putting in operator precedences, or the analogy of algebraic

terms and factors.

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.