Related articles |
---|
RE's to CFG's clive_minnican@hotmail.com (Clive Minnican) (2000-12-18) |
Re: RE's to CFG's cfc@world.std.com (Chris F Clark) (2000-12-19) |
Re: RE's to CFG's philip.fortomas@virgin.net (Philip Fortomas) (2000-12-20) |
From: | Chris F Clark <cfc@world.std.com> |
Newsgroups: | comp.compilers |
Date: | 19 Dec 2000 17:03:27 -0500 |
Organization: | The World Public Access UNIX, Brookline, MA |
References: | 00-12-070 |
Keywords: | parse, lex |
Posted-Date: | 19 Dec 2000 17:03:27 EST |
> Does anyone know of any good internet documents explaining the process
> involved in converting Regular Expressions (RE's) to Context-Free
> Grammars (CFG's) by hand? Or can anyone explain it to me easily?
The easy way to convert a regular expression to a context-free grammar
is by way of the linear grammars. Linear grammars have rules like:
non_terminal: non_terminal TERMINAL_1 TERMINAL_2 TERMINAL_3
| TERMINAL_4 TERMINAL_5;
Left-linear grammars allow rules with left-recursion (but no right nor
"center" recursion--the type used for parenthesized expressions).
Right-linear grammars only allow right recursion. General linear
grammars allow both left and right recursion, but no "center"
recursion.
// disallowed "center" recursion--not a linear grammar
non_terminal: TERMINAL_1 non_terminal TERMINAL_2;
I believe (but cannot state definitively) that you can reference other
non-terminals in a linear grammar. Technically, however, the
references to the other non-terminals must not result in indirect
recursion of the prohibited type(s) (or the grammar is not a linear
one). Now, since we don't really care whether the resulting grammar
is linear or not, this prohibition is irrelevant. However, if you had
a grammar that you wanted to turn back into a regular expression, the
prohibition is important, because only simple left and right
recursions have regular expression equivalents and any indirect
recursion that cannot be reduced to a simple left or right recursion
cannot be translated into a regular expression.
In any case, there is a simple translation into a linear grammar for
each regular-expression operator. Consider the following example:
non_terminal: A (B | C D+) E? (F G+ H)*;
1) Create a new non-terminal for each regular-expression operator.
b_or_c_d_plus: ... // nested | operators are a regular-expression
d_plus: ... // + is a regular-expression operator
e_qmark: ... // ? is a regular-expression operator
f_g_plus_h_star: ... // * is a regular-expression operator
g_plus: ... // + is a regular-expression operator
2) Write each new operator non-terminal out as its right-linear
translation.
non_terminal: A b_or_c_d_plus e_qmark f_g_plus_h_star;
// use new non_terminals in place of regular-expressions
b_or_c_d_plus: b | c d_plus;
// simply rewrite the nested | operator as is
d_plus: d | d d_plus;
// right linear form for a + operator
// uses right recursion to represent iteration
e_qmark: /* empty */ | e;
// linear form for a ? operator
// uses an empty (aka epsilon or lambda) alternative
f_g_plus_h_star: /* empty */ | f g_plus h
// right linear form for a * operator
// combines + (recursion) and ? (empty) concepts
g_plus: g | g_plus;
// same as before
Note, the translation of regular-expressions into new non-terminals
increases the number of non-terminals in the grammar and as a result
makes the parsing tables larger. However, before 1990 there weren't
common parser generators that could handle ELL or ELR grammars
(grammars with embedded regular expressions). As a result, you will
find a lot of language standards that have non-terminals like
id-list-opt to represent id* (or perhaps (id ("," id)*)? ). These
non-terminals represented the linearized regular-expressions.
Hope this helps,
-Chris
*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)
Return to the
comp.compilers page.
Search the
comp.compilers archives again.