Re: Regular Expressions (Scott J. McCaughrin)
Tue, 28 Nov 1995 08:26:05 GMT

          From comp.compilers

Related articles
[9 earlier articles]
Re: regular expressions wendt@CS.ColoState.EDU (1993-03-22)
Regular Expressions (trejo ortiz alejandro augusto) (1995-10-16)
Re: Regular Expressions (Mitchell Perilstein) (1995-10-23)
Re: Regular Expressions (1995-10-29)
Re: Regular Expressions (Colm O'Dunlaing) (1995-10-31)
Re: Regular Expressions (1995-11-03)
Re: Regular Expressions (1995-11-28)
Re: Regular Expressions (1995-11-29)
| List of all articles for this month |

Newsgroups: comp.compilers
From: (Scott J. McCaughrin)
Keywords: DFA
Organization: University of Illinois at Urbana
References: 95-11-033 95-10-087
Date: Tue, 28 Nov 1995 08:26:05 GMT

In a previous article, (Colm O'Dunlaing) says:

>1) Is there a canonical form for regular expressions(over a finite alphabet)?
> Yes, sort of. Convert a given regular expression R to a
> nondeterministic finite automaton N, then convert N to a
> determininistic finite automaton D, then apply minimisation to
> D to construct a minimal d.f.a. M. The automaton M is unique
> for the regular set it accepts, so M can be taken as a canonical
> form for the regular expression (I wouldn't bother about converting
> M back to a regular expression).

  This sounds more like a recipe for building the recognizer. So far as r.e.'s
  per se are conscerned, there is no such stipulation as "canonical form".

> Conversion from N to D may increase the number of states
> exponentially. So the canonical form could be very
> large. It is almost certain that any other choice of
> canonical form could also be very large, since the
> problem: Are R and S equivalent? (where R and S are
> regular expressions) is PSPACE complete.
> See
> 'Introduction to automata theory, languages, and computation,'
> by Hopcroft and Ullman.

  In any event, this is certainly not how r.e. recognizers are built in

>3) An ambiguous grammar for regular expressions over the alphabet {a, b}is the
> R::=RR | R + R | R* | (R) |a|b
>The question is: How can I state an unambiguous grammar for regular
> (You probably need two more symbols \emptyset and \lambda
> for the empty set and the empty string. However...)

  Well, \emptyset can be provably excluded from the set recognized by a
  d.f.s.a. (in fact, even from the n.f.s.a.)

> I think, but am not absolutely certain, that the following
> grammar will work. Implicitly, it prioritises operators
> in the order * > concatenation > +, and orders evaluation
> from right to left (I think).

  No it doesn't.

> A::= (R)|a|b
> B::=A*|B*
> C::=AR|BR
> D::=A+R|B+R|C+R
> R::=A|B|C|D
  How can it prioritize operators if B,C,D are all top-level yet each
  denotes a different operation? For starters, you need the start-symbol
  (R) to produce only right-hand sides involving the lowest-priority ops.
  But hey -- nice try!

  -- Scott McC.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.