Re: Theoretical CFG question

pab@cs.arizona.edu (Peter A. Bigot)
Mon, 16 Nov 1992 20:00:27 GMT

From comp.compilers

Related articles
Theoretical CFG question norlin@essex.ecn.uoknor.edu (1992-11-13)
Re: Theoretical CFG question jos@and.nl (1992-11-15)
*Re: Theoretical CFG question pab@cs.arizona.edu* (1992-11-16)**
Re: Theoretical CFG question jos@and.nl (1992-11-17)

| List of all articles for this month |

Newsgroups:	comp.compilers
From:	pab@cs.arizona.edu (Peter A. Bigot)
Organization:	U of Arizona CS Dept, Tucson
Date:	Mon, 16 Nov 1992 20:00:27 GMT
References:	92-11-067
Keywords:	parse

norlin@essex.ecn.uoknor.edu (Norman Lin) writes:
>The problem was:
> *
>For i>=1, let b(i) denote the string in 1(0+1) that is the
>binary representation of i.
>
>Construct a CFG generating
> +
> {0,1,#} - {b(1)#b(2)# ... #b(n) | n>=1}

I'll refrain from specific comments about a professor who isn't interested
in addressing questions about problems nobody in a class can solve (did
you try asking hir in office hours?).

This is problem 4.3 in Hopcroft and Ullman. Note that although the
complement of the language is not a CFL (pump on it), CFL's aren't closed
under complement; they are closed under union, though, so you can simply
define CFGs for each way a string could fail to be in the form
b_1#b_2#...#b_n.

I've attached my own solution to this from a couple years back (this is
ugly TeX before I switched to cleaner LaTeX; the idea should be present).
The conclusion of the class (and the professor), however, was that the
problem was way too hard; and if the even numbered b_i were given in
reverse, the resulting grammar would be feasible (easier to generate
incorrect mirror images). As noted, the actual grammar below is not
correct.

\def\re#1{
                \ifmmode{\rm\bf #1}
                \else{\bf #1}
                \fi}

\proofsect {Method}: The idea is to split $L$ into several languages, each
of which can be constructed using a context free grammar, then merging the
grammars to form one for $L$.

{\advance\leftskip by 0.25in
\item{1} Strings which do not begin with $b_1$ are not in the language.\par
\item{2} Strings in which one of the purported $b_i$ does not begin with a
$\re1$ are not in the language.\par
\item{3} Strings in which some sequence purported to be $b_i\#b_{i+1}$ has an
error are not in the language. I assume that $b_i$ is well formed, and the
error is in $b_{i+1}$; without this assumption, correctness is harder to
ensure (variations in $b_i$ may make the supposedly ill-formed $b_{i+1}$
correct). This set can be further subdivided:\par
{\advance\leftskip by 0.25in
\item{3a} If we assume that $b_i = \re1^n$, then $b_{i+1}$ should be
$\re1\re0^n$. The error is either in the leading \re1 or in the
sequence of $n$ \re0s of $b_{i+1}$.\par
\item{3b}. Otherwise, $b_i$ is of the form $\rho01^n$ where $n \ge 0$.
$b_{i+1}$ should then be $\rho10^n$. The error can occur in four places:
extra characters before the prefix, or an incorrect character match in
the prefix, the \re1 immediately following the prefix, or the trailing
\re0s.\par
}}

\hang A grammar which was derived from this partitioning is attached.
Nonterminals are enclosed in angle brackets, and productions can be combined
using a vertical bar when the nonterminal being rewritten is the same.
(Although this grammar does not seem to generate strings which are not in $L$,
there are strings of $L$ which it will not generate. Right idea; wrong
details---this really is a horrible problem.)

// Solution to problem 4.3, Hopcroft & Ullman
//
01# // symbols in CFG
<S> // External nonterminals
<S> // Start symbol

<S> -> <L1> // $w where w != (1 or 1#)
          | <L2> // #w where w [1] != 1
          | <H><L3><T> // there is a bi#bj where i+1 <> j in the mix

// Odds and ends--sequences of ones, digits, numbers, anything
<O> -> 1<O>
          | //epsilon
<D> -> 1
          | 0
<N> -> <D><N>
          | //epsilon
<Q> -> 0
          | 1
          | #
<W> -> <Q><W>
          | //epsilon

// Sequence starts with something other than b1
<L1> -> 0<W>
            | #<W>
            | 1<D><W>

// There is an ill-formed integer somewhere in the mix
<L2> -> <W>#0<W>
            | <W>##<W>

// Head and tail of string: ensure that <H>w<T> makes w a bi#bj sequence.
<H> -> <W>#
          | //epsilon
<T> -> #<W>
          | //epsilon

<L3> -> <La> // Inverse of 1^n#10^n, intersect 1*#<W>
            | <Lb> // Inverse of p01^n#p10^n

<La> -> 1<La>0 // 1s and 0s still balance
            | 1<La_F>1 // Bad 1 on right
            | #0<N> // Error--0 starts right
            | #1<D><N> // Too much on right
            | <O># // Too many 1s on left
<La_F> -> 1<La_F><Q> // Maintain balance of 1s/0s
                | #<N> // Tack on whatever you want on right

// Inverse of p01^n#p10^n
// Assume error in prefix: 0 on left, 1 on right
<P-L-A0> -> <D><P-L-A0><D> | 0 // Error is center 0; generate prefix around it
<P-L-B0> -> <D><P-L-A0>0 // Generate 0 following prefix
<P-L-C0> -> <D><P-L-C0>1 | <P-L-B0> // Generate trailing 1s in left side
<P-L-D0> -> <D><P-L-C0>#<N> // Generate separating hash
<P-L-E0> -> <D><P-L-E0><D> | <P-L-D0> // Generate upper end of right prefix
<P-L-A1> -> <D><P-L-A1><D> | 1 // Error is center 1; generate prefix around it
<P-L-B1> -> <D><P-L-A1>0 // Generate 0 following prefix
<P-L-C1> -> <D><P-L-C1>1 | <P-L-B1> // Generate trailing 1s in left side
<P-L-D1> -> <D><P-L-C1>#<N> // Generate separating hash
<P-L-E1> -> <D><P-L-E1><D> | <P-L-D1> // Generate upper end of right prefix
<P-R-A1> -> <D><P-R-A1><D> | 1 // Error is center 1; generate prefix around it
<P-R-B1> -> <N>#<P-R-A1><D> // Separating hash
<P-R-C1> -> 1<P-R-C1><D> | <P-R-B1> // Tail 1s of left side
<P-R-D1> -> 0<P-R-C1><D> // After prefix 0 on left side
<P-R-E1> -> <D><P-R-E1><D> | <P-R-D1> // Right edge of left prefix
<P-R-A0> -> <D><P-R-A0><D> | 0 // Error is center 0; generate prefix around it
<P-R-B0> -> <N>#<P-R-A0><D> // Separating hash
<P-R-C0> -> 1<P-R-C0><D> | <P-R-B0> // Tail 1s of left side
<P-R-D0> -> 0<P-R-C0><D> // After prefix 0 on left side
<P-R-E0> -> <D><P-R-E0><D> | <P-R-D0> // Right edge of left prefix

// Assume error at first zero after prefix: pairs with 0 on right
<C-L-A0> -> <D><C-L-A0>1 | 0 // 0 after prefix
<C-L-B0> -> <D><C-L-A0># // Trailing 1s plus hash
<C-L-C0> -> <D><C-L-C0><D> | <C-L-B0> // Part of right prefix
<C-R-Z0> -> <D><C-R-Z0>0 | 0 // Make "1" after prefix a 0
<C-R-Y0> -> #<C-R-Z0>0 // Separating hash
<C-R-X0> -> 1<C-R-X0>0 | <C-R-Y0> // Last 1s of left side

// Assume error in trailing 1s on left side: pairs with 1 on right

<T-A> -> 1<T-A><D> // Last 1s of left tail, last Xs of right tail
          | #<N>1 // Sep, prefix, 0, 1s, and error

<Lb> -> <P-L-A0><P-R-E1> // Break before end of left prefix
            | <P-L-C0><P-R-C1> // Break somewhere in trailing 1s on left
            | <P-L-E0><P-R-A1> // Break somewhere in left of right prefix
            | <P-L-A1><P-R-E0> // Break before end of left prefix
            | <P-L-C1><P-R-C0> // Break somewhere in trailing 1s on left
            | <P-L-E1><P-R-A0> // Break somewhere in left of right prefix
                // Error in first zero after left side prefix
            | <C-L-A0><C-R-X0> // Break on left side
            | <C-L-C0><C-R-Z0> // Break on right side
                // Error in trailing 1s of left side after prefix
            | <N>1<T-A>

--
                                          Peter A. Bigot -- pab@cs.arizona.edu
                    Dept. of Computer Science, University of Arizona, Tucson AZ
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Theoretical CFG question

pab@cs.arizona.edu (Peter A. Bigot)Mon, 16 Nov 1992 20:00:27 GMT

pab@cs.arizona.edu (Peter A. Bigot)
Mon, 16 Nov 1992 20:00:27 GMT