Re: Some Finite State Machine and Parsing problems

clark@quarry.zk3.dec.com (Chris Clark USG)
3 Nov 1997 21:02:11 -0500

          From comp.compilers

Related articles
Some Finite State Machine and Parsing problems rboland@csi.uottawa.ca (Ralph Boland) (1997-11-02)
Re: Some Finite State Machine and Parsing problems clark@quarry.zk3.dec.com (1997-11-03)
| List of all articles for this month |

From: clark@quarry.zk3.dec.com (Chris Clark USG)
Newsgroups: comp.compilers,comp.theory
Date: 3 Nov 1997 21:02:11 -0500
Organization: Digital Equipment Corporation - Marlboro, MA
Distribution: inet
References: 97-11-011
Keywords: theory, DFA

You asked some interesting questions in regard to finite state
transducers and regular expression transducers.


> Q1.1 Are there useful applications where we would want to
> represent an FST by a RET?


Probably. However, there is one important consideration. A RE
corresponds most closely to a NFA (non-deterministic finite state
automata). Similarly, a RET corresponds most closely to a
non-deterministic finite state transduced. Now, with finite state
automata there are simple transformations to convert a non-
deterministic automata into a deterministic one, and this is important
to your later questions. In particular, the non-determinism is what
gives rise to the lookahead needs.


That does not mean all is lost; an Earley parser can deal with the
non-determinism and handle your RET's.


> Q1.4 Can we really do this [extend RRP(k) parsers to handle RETs and
> parse using them] and if so what are the applications?


Yes and no. The extension won't work in general because of the
non-determinism. An RRP(k) parser depends on the fact that a NFA can
be converted to a DFA, because an RRP(k) parser can only advance to
one state at a time (and must do so deterministicly). Now, it is
possible to consider non-deterministic RRP(k) parsers, but that is not
what most people mean by RRP(k). [Note, here I am assuming you mean
ELR(k) or LL(k) for RRP(k) since those are the common RRP(k) variants
and your later description matches an early ELR(k) implementation
method.]


> RRP(k) parsers work by processing input twice, once in each direction.


Not necessarily. Yacc++ does only one pass over the input (one
forward pass, no backward passes). When we wrote it we didn't know
that you would "need" two passes for ELR(k). Of course, in the
interim we have discovered a paper by Nakata and Sassa which has
buried in it the proof that only one pass is necessary.


> Q1.5 If we try the same trick with a RET can we build a machine
> with a finite number of states corresponding to the RET?
> Clearly we can process the input from left to right with
> a finite number of states. Can we then process the input
> a second time in the reverse direction so that we can
> output a character for each character of input read?


Yes, I believe you are correct here. I believe you can process an RET
in two passes, provided that you have a way of resolving the
ambiguity. Although that may require ambiguity resolution by fiat
(i.e. if the RET is ambiguous, you guarantee that one of the legal
outputs will be selected, but not necessarily which one).


> I call such expressions recursive regular expressions (RREs).


This is exactly the same set as the ELR(k) [or RRP(k)] grammars.
Well, if you require non-determinism, the set would be E-Earley
(i.e. Early parsing of a regular right part grammars).


> Q2.1 Can we translate such expressions into a grammar?


Yes, there are translations which convert RE's into recursive grammar
form. Some early ELR(k) parsers used that technology. More recent
parsing methodologies allow the direct conversion of ELR(k) grammars
into parsers.


> Q2.2 Are there parsing techknowlogies that can parse such
> grammars efficiently, preferably in linear time.


Yes, the parsing complexity is exactly the same as LR(k).


> Note that, though LR(k) parsing cannot handle a palidrome


LR(k) parsing can handle palindromes (and your grammar is the LR(k)
grammar for the exact palindromes you describe), the stack exactly
assures that it can. There are languages that LR(k) cannot handle,
for example the copy language, but palindromes are fine.


> I hope some of you find these problems interesting
> and can provide me with some answers or directions for
> future research.


Your problems certainly show some interesting directions and are worth
your pursuing more. And don't let the fact that you have made some
errors stop you. Sometimes errors can lead to astounding discoveries.
For example, when we first wrote Yacc++, we hadn't read that direct
ELR parsing was considered a hard unsolved problem [at that time],
with complicated read-back mechanisms and other implementation
difficulties. We just thought that combining a couple of simple ideas
should work it out. Of course, some of those simple ideas were
actually errors in our understanding of standard LR parsing, although
since then some of the errors have become alternative LR parsing
methods (such as non-canonical LR parsing). Had we been frightened by
the difficulty of the task or not made the errors we made (which
turned out to simplify the problem conceptually), we would not have
found our solution to the problem. The same may hold for you.


For example, your Q1.5 is very similar to an idea that I have which
suggests that all non-ambiguous grammars (consisting of recursion and
regular expressions) can be parsed in linear time (not just the LR(k)
ones).


-Chris Clark
************************************************************************
Compiler Resources, Inc. email: compres@world.std.com
3 Proctor St. http://world.std.com/~compres
Hopkinton, MA 01748 phone: (508) 435-5016
USA 24hr fax: (508) 435-4847
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.