Re: Formatting of Language LRMs

federation2005@netzero.com
Fri, 1 Aug 2014 14:58:57 -0700 (PDT)

From comp.compilers

Related articles
[3 earlier articles]
Re: Formatting of Language LRMs gah@ugcs.caltech.edu (glen herrmannsfeldt) (2014-06-21)
Re: Formatting of Language LRMs kaz@kylheku.com (Kaz Kylheku) (2014-06-21)
Re: Formatting of Language LRMs Pidgeot18@verizon.net.invalid (=?UTF-8?B?Sm9zaHVhIENyYW5tZXIg8J+Qpw==?=) (2014-06-22)
Re: Formatting of Language LRMs mertesthomas@gmail.com (2014-06-30)
Re: Formatting of Language LRMs ivan@ootbcomp.com (Ivan Godard) (2014-07-03)
Re: Formatting of Language LRMs federation2005@netzero.com (2014-07-28)
*Re: Formatting of Language LRMs federation2005@netzero.com* (2014-08-01)**
Re: Formatting of Language LRMs gah@ugcs.caltech.edu (glen herrmannsfeldt) (2014-08-03)

| List of all articles for this month |

From:	federation2005@netzero.com
Newsgroups:	comp.compilers
Date:	Fri, 1 Aug 2014 14:58:57 -0700 (PDT)
Organization:	Compilers Central
References:	14-06-010 14-07-064
Keywords:	syntax, theory
Posted-Date:	02 Aug 2014 11:01:16 EDT

On Monday, July 28, 2014 7:55:12 PM UTC-5, federat...@netzero.com wrote:
> ... publication of algebraic frameworks for level 2 in the Chomsky
> Hierarchy. An algebra was defined in LNCS 4988 ("The Algebraic
> Approach I, II")

A framework for 2nd order formalisms (expressed as Eilenberg-Moore algebras)
that does the analogue to what the Peano Axioms do for arithmetic.

I put some notes that expand on 4988 up on Scribd under
The Algebraic Approach II - V
http://www.scribd.com/doc/235317956/The-Algebraic-Approach-II-V

This is still a work in progress.

> An equivalent formalism was devised by Kozen a couple years ago
http://arxiv.org/abs/1309.0893
Actually Grathwohl is the lead author.

> BNF allows sequences on the RHS of a rule.
> EBNF allows regular expressions on the RHS of a rule.
> (???) allows full-fledged context-free expressions (CFE) on the RHS.
>
> There are a couple ways one can specify a CFE. One, which harkens to the
older
> formalism (predating 2008) is the "grammar expressions", consisting of a
> series of rules followed by an expression (much like GCC's "statement
> expression"). (Even in subexpressions).

The "older formalism" is the one which has been in place since the
1970's developed by Gruska, McWhirter and Yntema. The key elements are
the substitution operator (which I denote here (x = A, B)), and the
least fixed point operator (mu x.A). This may be grafted on top of the
Kleene algebra of regular expressions. The star A* B becomes (mu x.B |
Ax), for instance.

> The newer notation permits "bra" and "ket" operators in expressions on the
> RHS.

... and is the one I've spoken of here and elsewhere in the past. It is a
direct outgrowth of the Chomsky-Schutzenberger Theorem, and incorporates an
algebra (that would be very familiar to people in DSP who work with
multiresolution analysis)
      bd = 1 = pq, bq = 0 = pd, db + qd = 1
and the {b,d,p,q} commute with the {x,y}'s.

So, the Dyck language, D(x,y) = mu S.(1 | xSy | SS) would simply become
      b(xp | yq)* d.

Such an expression -- minus the algebra -- comes straight out of the
Chomsky-Schutzenberger Theorem, in which one obtains the language D(x,y) by
(1) expanding the alphabet X = {x,y} to X' = {x,y,b,d,p,q},
(2) devising a regular expression (which I term the "Chomsky-Schutzenberger
Kernel") b (xp | yq)* d over that alphabet
(3) taking the intersection of this with the interleave
      (x | y)* interleave D(b,p;d,q)
where the 2-bracket Dyck language is D(b,p;d,q) = mu S.(1 | bSd | pSq | SS)
(4) mapping {b,p,d,q} to the empty word 1.

This can be extended to a more general algbera in which multiple bracket
symbols are permitted b0,b1,b2,... d0,d1,d2,.... The corresponding algebra
then assumes a form reminiscent of the "bra" and "ket" notation used in
Hilbert spaces:
      b0, b1, b2, ... become the "bras" <0|, <1|, <2|, ...
      d0, d1, d2, ... become the "kets" |0>, |1>, |2>, ...
and the underlying algebra generalizes to one which has the identities

If you are encoding context-free transductions with input alphabet X and
output alphabet Y, the very same framework applies, with the additional
element that the X's and Y's commute. The commutativity rule yx -> xy
corresponds to what in parsing theory is called a "look ahead".

The tricky part about incorporating these extensions is devising a syntax for
an extended BNF notation that allows all these elements to cohabit with one
another.

The result is a 3 level framework that has BNF (plus the fixed point and
substitution operators) at level 1, regular expression operators A*, A?, A+
(i.e. EBNF) at level 2, and the bra-ket algebra at level 3.

Thus, a syntax for D(x,y) (z D(x,y))* could be written as the substitution
expression ("Chomsky-Schuetzenberger" form)
      S = <0|(<1|x)*(y|1>)*|0>, S(zS)*
or as the grammar expression ("Gruska-McWhirter-Yntema" form)
      S -> SS, S -> xSy, S -> 1, S(zS)*
or in a wide variety of other ways.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Formatting of Language LRMs

federation2005@netzero.comFri, 1 Aug 2014 14:58:57 -0700 (PDT)

federation2005@netzero.com
Fri, 1 Aug 2014 14:58:57 -0700 (PDT)