Related articles |
---|
[3 earlier articles] |
Re: TeX syntax? ara@nestle.csail.mit.edu (Allan Adler) (2007-02-09) |
Re: TeX syntax? phlucas@f-m.fm (Philipp Lucas) (2007-02-12) |
Re: TeX syntax? jhallen@TheWorld.com (2007-02-16) |
Re: TeX syntax? gah@ugcs.caltech.edu (glen herrmannsfeldt) (2007-02-16) |
Re: TeX syntax? jhallen@TheWorld.com (2007-02-25) |
Re: TeX syntax? gjthill@gmail.com (Jim Hill) (2007-02-25) |
Re: TeX syntax? rockbrentwood@gmail.com (Rock Brentwood) (2021-04-04) |
Re: TeX syntax? gah4@u.washington.edu (gah4) (2021-04-05) |
Re: TeX syntax? gah4@u.washington.edu (gah4) (2021-04-05) |
Re: macros of yore, was TeX syntax? gah4@u.washington.edu (gah4) (2021-04-09) |
From: | Rock Brentwood <rockbrentwood@gmail.com> |
Newsgroups: | comp.compilers |
Date: | Sun, 4 Apr 2021 14:08:30 -0500 |
Organization: | Compilers Central |
References: | 07-02-024 |
Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="57677"; mail-complaints-to="abuse@iecc.com" |
Keywords: | syntax |
Posted-Date: | 04 Apr 2021 21:12:04 EDT |
[ This is a followup to a thread from 2007. ]
>I've looked high and low without success. Where can i find
>something resembling the BNF of Knuth's TeX typesetting syntax?
It's in the file weave.web, section 14.
The syntax for TeX was written as a context-sensitive translation
grammar, suitable for a streaming translator, rather than a
context-free grammar. It may be possible to convert it to one (either
directly or as a context-free enveloping grammar with semantic
constraints). That's a matter that may be worth looking into. But in
its present form, there is no tree-building required or involved: it
can stream. The distinction is analogous to that between SSDT's versus
SDT's ([S]SDT = [simple-]syntax-directed translations). SSDT's can be
streamed, SDT's require stacking or treeing values and, in effect, SDT
= SSDT + value-stacking/treeing.
TeX is written in Web which is essentially Pascal + hyper-linked
comments. It is also in C-Web on the main TeX distribution site, which
is C + hyper-linked comments. They *can* be converted directly to more
normal programs with the comments embedded. I did so in the local
versions of my older MiKTeX distribution, but haven't
regression-tested it yet - since I haven't established a working
baseline yet to work off of.
The syntax is in - and an essential part - of the weave.web file. In detail:
Section 14.1 describes the framework used for the syntax
Section 14.2 lists the "category codes" used
Section 14.3 lists additional post-processed lexical units used
Section 14.4 lists describes a processing layer from lexical units to "scraps"
Section 14.5 contains the productions for the context sensitive grammar
Section 15 implements the parser; the most important routine being
translate_cases() (its name in the C-Web file) - as a master "switch"
statement (or "case" statement in Pascal) in section 15.7.
By the way the "open" case (its subcases are in 15.19), "math" subcase
(its sub-subcases iare in 15.20), "var_head" sub-subcase has a bug in
it. The "intro" sub-sub-subcase listed a transition to rule 31,
instead of to rule 33. (I want my money Knuth! :))
I believe it's possible to convert it all to a context-free grammar,
albeit with the possible inclusion of a few semantic constraints. Why
Knuth chose to write everything this way - as borderline obfuscated
code that cannot be scaled upwards or sideways or integrated in other
existing code - is beyond me. But it is not maintainable, and
heavily-laden with Technical Debt; notably, its *critical* reliance on
the dated assumption that the Future would be Pascal, along with all
the other assumptions and - more importantly - the now-unnecessary
restrictions that came out of that.
Much of the very design of the entire Web framework's very conception
and design was premised on the assumed necessity of those
restrictions; and the whole thing can be done on a much simpler
foundation, when remade in more up-to-date terms (relatively speaking)
*natively* in C or C++. Among other things, there isn't a need for any
Web-like framework. You can just simply use ordinary comments. I know,
because I did so: I rewrote the entire set of Web files in my local
copy doing just that. When a baseline is established and it is
regression-validated I'll put a copy up on GitHub.
A follow-up to the additional comments at the end of the article:
>Knuths TeX book is an abomination, describing lexing and parsing
>as mouth, gullet and stomach nonsense.
I know. It's literally a woven and tangled mess - both the book and the code.
>[Well, he invented most of what we know about parsing, he gets to
>explain it any way he wants. Chapters 7 and 8 describe the syntax
>operationally. -John]
Discovery. Not invention. Mathematics is not invented, it is
discovered (and in this case: only a partial and incomplete discovery).
And that, too, is a complete tangle that we had to remake from bottom
up. Now, finally with recent publications [2-5] establishing the
foundations for the new algebraic framework ... along with another,
currently in submission, that may come out in 2021, for the remaking
alluded to in [3] of the 1963 algebraic formulation by Chomsky and
Schuetzenberger [1] that lies at the foundation of this all, we're now
finally in a position to refactor both the theory itself and
everything that's based on it or is an application of it; literally
remaking the entire stack from bottom up.
References:
[1] Chomsky, N., Schuetzenberger, M.: "The algebraic theory of context
free languages". In: Braffort, P., Hirschberg, D. (eds.) Computer
Programming and Formal Systems, pp. 118=E2=80=93161. North-Holland, Amsterdam
(1963)
[2] H. Lei=C3=9F et al: "C-dioids and =CE=BC-continuous Chomsky algebras". In:
Desharnais, J., et al. (eds.) RAMiCS 2018. LNCS, vol. 11194, pp.
21=E2=80=9336. Springer, Cham (2018)
[3] M. Hopkins et al: "Coequalizers and Tensor Products for Continuous
Idempotent Semirings". In: Desharnais, J., et al. (eds.) RAMiCS 2018.
LNCS, vol. 11194, pp. 37-52. Springer, Cham (2018)
[4] M.Hopkins: "The algebraic approach I: the algebraization of the
Chomsky hierarchy". In: Berghammer, R., M=C3=B6ller, B., Struth, G. (eds.)
RelMiCS 2008. LNCS, vol. 4988, pp. 155=E2=80=93172. Springer, Heidelberg
(2008).
[5] N.B.B. Grathwohl et al: "Infinitary axiomatization of the
equational theory of context-free languages". In: Baelde, D., Carayol,
A. (eds.) Fixed Points in Computer Science (FICS 2013). EPTCS, vol.
126, pp. 44=E2=80=9355 (2013)
Return to the
comp.compilers page.
Search the
comp.compilers archives again.