Re: TeX syntax?

Rock Brentwood <>
Sun, 4 Apr 2021 14:08:30 -0500

          From comp.compilers

Related articles
[3 earlier articles]
Re: TeX syntax? (Allan Adler) (2007-02-09)
Re: TeX syntax? (Philipp Lucas) (2007-02-12)
Re: TeX syntax? (2007-02-16)
Re: TeX syntax? (glen herrmannsfeldt) (2007-02-16)
Re: TeX syntax? (2007-02-25)
Re: TeX syntax? (Jim Hill) (2007-02-25)
Re: TeX syntax? (Rock Brentwood) (2021-04-04)
Re: TeX syntax? (gah4) (2021-04-05)
Re: TeX syntax? (gah4) (2021-04-05)
Re: macros of yore, was TeX syntax? (gah4) (2021-04-09)
| List of all articles for this month |

From: Rock Brentwood <>
Newsgroups: comp.compilers
Date: Sun, 4 Apr 2021 14:08:30 -0500
Organization: Compilers Central
References: 07-02-024
Injection-Info:; posting-host=""; logging-data="57677"; mail-complaints-to=""
Keywords: syntax
Posted-Date: 04 Apr 2021 21:12:04 EDT

[ This is a followup to a thread from 2007. ]

>I've looked high and low without success. Where can i find
>something resembling the BNF of Knuth's TeX typesetting syntax?

It's in the file weave.web, section 14.

The syntax for TeX was written as a context-sensitive translation
grammar, suitable for a streaming translator, rather than a
context-free grammar. It may be possible to convert it to one (either
directly or as a context-free enveloping grammar with semantic
constraints). That's a matter that may be worth looking into. But in
its present form, there is no tree-building required or involved: it
can stream. The distinction is analogous to that between SSDT's versus
SDT's ([S]SDT = [simple-]syntax-directed translations). SSDT's can be
streamed, SDT's require stacking or treeing values and, in effect, SDT
= SSDT + value-stacking/treeing.

TeX is written in Web which is essentially Pascal + hyper-linked
comments. It is also in C-Web on the main TeX distribution site, which
is C + hyper-linked comments. They *can* be converted directly to more
normal programs with the comments embedded. I did so in the local
versions of my older MiKTeX distribution, but haven't
regression-tested it yet - since I haven't established a working
baseline yet to work off of.

The syntax is in - and an essential part - of the weave.web file. In detail:
Section 14.1 describes the framework used for the syntax
Section 14.2 lists the "category codes" used
Section 14.3 lists additional post-processed lexical units used
Section 14.4 lists describes a processing layer from lexical units to "scraps"
Section 14.5 contains the productions for the context sensitive grammar

Section 15 implements the parser; the most important routine being
translate_cases() (its name in the C-Web file) - as a master "switch"
statement (or "case" statement in Pascal) in section 15.7.

By the way the "open" case (its subcases are in 15.19), "math" subcase
(its sub-subcases iare in 15.20), "var_head" sub-subcase has a bug in
it. The "intro" sub-sub-subcase listed a transition to rule 31,
instead of to rule 33. (I want my money Knuth! :))

I believe it's possible to convert it all to a context-free grammar,
albeit with the possible inclusion of a few semantic constraints. Why
Knuth chose to write everything this way - as borderline obfuscated
code that cannot be scaled upwards or sideways or integrated in other
existing code - is beyond me. But it is not maintainable, and
heavily-laden with Technical Debt; notably, its *critical* reliance on
the dated assumption that the Future would be Pascal, along with all
the other assumptions and - more importantly - the now-unnecessary
restrictions that came out of that.

Much of the very design of the entire Web framework's very conception
and design was premised on the assumed necessity of those
restrictions; and the whole thing can be done on a much simpler
foundation, when remade in more up-to-date terms (relatively speaking)
*natively* in C or C++. Among other things, there isn't a need for any
Web-like framework. You can just simply use ordinary comments. I know,
because I did so: I rewrote the entire set of Web files in my local
copy doing just that. When a baseline is established and it is
regression-validated I'll put a copy up on GitHub.

A follow-up to the additional comments at the end of the article:
>Knuths TeX book is an abomination, describing lexing and parsing
>as mouth, gullet and stomach nonsense.

I know. It's literally a woven and tangled mess - both the book and the code.

>[Well, he invented most of what we know about parsing, he gets to
>explain it any way he wants. Chapters 7 and 8 describe the syntax
>operationally. -John]

Discovery. Not invention. Mathematics is not invented, it is
discovered (and in this case: only a partial and incomplete discovery).

And that, too, is a complete tangle that we had to remake from bottom
up. Now, finally with recent publications [2-5] establishing the
foundations for the new algebraic framework ... along with another,
currently in submission, that may come out in 2021, for the remaking
alluded to in [3] of the 1963 algebraic formulation by Chomsky and
Schuetzenberger [1] that lies at the foundation of this all, we're now
finally in a position to refactor both the theory itself and
everything that's based on it or is an application of it; literally
remaking the entire stack from bottom up.

[1] Chomsky, N., Schuetzenberger, M.: "The algebraic theory of context
free languages". In: Braffort, P., Hirschberg, D. (eds.) Computer
Programming and Formal Systems, pp. 118=E2=80=93161. North-Holland, Amsterdam
[2] H. Lei=C3=9F et al: "C-dioids and =CE=BC-continuous Chomsky algebras". In:
Desharnais, J., et al. (eds.) RAMiCS 2018. LNCS, vol. 11194, pp.
21=E2=80=9336. Springer, Cham (2018)
[3] M. Hopkins et al: "Coequalizers and Tensor Products for Continuous
Idempotent Semirings". In: Desharnais, J., et al. (eds.) RAMiCS 2018.
LNCS, vol. 11194, pp. 37-52. Springer, Cham (2018)
[4] M.Hopkins: "The algebraic approach I: the algebraization of the
Chomsky hierarchy". In: Berghammer, R., M=C3=B6ller, B., Struth, G. (eds.)
RelMiCS 2008. LNCS, vol. 4988, pp. 155=E2=80=93172. Springer, Heidelberg
[5] N.B.B. Grathwohl et al: "Infinitary axiomatization of the
equational theory of context-free languages". In: Baelde, D., Carayol,
A. (eds.) Fixed Points in Computer Science (FICS 2013). EPTCS, vol.
126, pp. 44=E2=80=9355 (2013)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.