# Re: latex grammar

## David Z Maze <dmaze@mit.edu>3 Apr 2004 09:02:40 -0500

From comp.compilers

Related articles
latex grammar journey@op.pl (Lukasz) (2004-03-26)
Re: latex grammar boldyrev+nospam@cgitftp.uiggm.nsc.ru (Ivan Boldyrev) (2004-04-03)
Re: latex grammar dmaze@mit.edu (David Z Maze) (2004-04-03)
Re: latex grammar haberg@matematik.su.se (2004-04-03)
Re: latex grammar gah@ugcs.caltech.edu (glen herrmannsfeldt) (2004-04-03)
Re: latex grammar bonzini@gnu.org (2004-04-03)
Re: latex grammar theo@engr.mun.ca (Theodore Norvell) (2004-04-14)
Re: latex grammar James.Leifer@inria.fr (James Leifer) (2004-04-14)
| List of all articles for this month |

 From: David Z Maze Newsgroups: comp.compilers Date: 3 Apr 2004 09:02:40 -0500 Organization: Massachusetts Institute of Technology References: 04-03-099 Keywords: parse Posted-Date: 03 Apr 2004 09:02:40 EST

"Lukasz" <journey@op.pl> writes:

> Where can I find grammar for latex so I can create parser?

I suspect you'd have a fair bit of trouble writing a parser for LaTeX
using traditional tools. Fundamentally, LaTeX is just a set of macros
on top of the base TeX language. But that language has some odd
runtime constructs, and lets you change the behavior of the scanner at
runtime. So you can do tricks like

\def\makecommand#1#2{\expandafter\def\csname #1\endcsname{#2}}
\makecommand{foo}{bar}
\foo % evaluates to "bar"

to create a command that defines a symbol based on a name, for
example. (\expandafter means "skip the next symbol momentarily"; then
the runtime evaluates \csname #1\endcsname to produce the symbol; then
the runtime goes back to the \def and now sees \def\foo{bar}.) There
are also irregularities like \verb in the "standard" LaTeX language,
where the delimiter for the command arguments is the next character.

There's probably two ways to go, and it depends on what your goal is.
You could write an evaluator for base TeX with some effort; the
TeXbook is dense but should have the information you need. This could
read in any TeX, LaTeX, texinfo, ... document; if you wanted to
reimplement the layout engine or convert to plain text, this might be
a good way to go. Or, you could write an evaluator that understood
the higher-level LaTeX constructs, which I gather is what the various
LaTeX-to-foo converters generally try to do. This has the advantage
of capturing the semantic meaning in e.g. \section but will work
poorly against documents that use more advanced features (manual
\defs, packages such as tabularx that provide features differently).
For doing this, having a formatted copy of the LaTeX macro source is
helpful; you could probably also find a reference card that lists
"all" of the standard LaTeX commands.

--
David Maze dmaze@mit.edu http://www.mit.edu/~dmaze/
"Theoretical politics is interesting. Politicking should be illegal."
-- Abra Mitchell

Post a followup to this message