Re: latex grammar

David Z Maze <dmaze@mit.edu>
3 Apr 2004 09:02:40 -0500

          From comp.compilers

Related articles
latex grammar journey@op.pl (Lukasz) (2004-03-26)
Re: latex grammar boldyrev+nospam@cgitftp.uiggm.nsc.ru (Ivan Boldyrev) (2004-04-03)
Re: latex grammar dmaze@mit.edu (David Z Maze) (2004-04-03)
Re: latex grammar haberg@matematik.su.se (2004-04-03)
Re: latex grammar gah@ugcs.caltech.edu (glen herrmannsfeldt) (2004-04-03)
Re: latex grammar bonzini@gnu.org (2004-04-03)
Re: latex grammar theo@engr.mun.ca (Theodore Norvell) (2004-04-14)
Re: latex grammar James.Leifer@inria.fr (James Leifer) (2004-04-14)
| List of all articles for this month |

From: David Z Maze <dmaze@mit.edu>
Newsgroups: comp.compilers
Date: 3 Apr 2004 09:02:40 -0500
Organization: Massachusetts Institute of Technology
References: 04-03-099
Keywords: parse
Posted-Date: 03 Apr 2004 09:02:40 EST

"Lukasz" <journey@op.pl> writes:


> Where can I find grammar for latex so I can create parser?
> Thanks for any advices.


I suspect you'd have a fair bit of trouble writing a parser for LaTeX
using traditional tools. Fundamentally, LaTeX is just a set of macros
on top of the base TeX language. But that language has some odd
runtime constructs, and lets you change the behavior of the scanner at
runtime. So you can do tricks like


    \def\makecommand#1#2{\expandafter\def\csname #1\endcsname{#2}}
    \makecommand{foo}{bar}
    \foo % evaluates to "bar"


to create a command that defines a symbol based on a name, for
example. (\expandafter means "skip the next symbol momentarily"; then
the runtime evaluates \csname #1\endcsname to produce the symbol; then
the runtime goes back to the \def and now sees \def\foo{bar}.) There
are also irregularities like \verb in the "standard" LaTeX language,
where the delimiter for the command arguments is the next character.


There's probably two ways to go, and it depends on what your goal is.
You could write an evaluator for base TeX with some effort; the
TeXbook is dense but should have the information you need. This could
read in any TeX, LaTeX, texinfo, ... document; if you wanted to
reimplement the layout engine or convert to plain text, this might be
a good way to go. Or, you could write an evaluator that understood
the higher-level LaTeX constructs, which I gather is what the various
LaTeX-to-foo converters generally try to do. This has the advantage
of capturing the semantic meaning in e.g. \section but will work
poorly against documents that use more advanced features (manual
\defs, packages such as tabularx that provide features differently).
For doing this, having a formatted copy of the LaTeX macro source is
helpful; you could probably also find a reference card that lists
"all" of the standard LaTeX commands.


--
David Maze dmaze@mit.edu http://www.mit.edu/~dmaze/
"Theoretical politics is interesting. Politicking should be illegal."
-- Abra Mitchell



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.