Re: Pre-Parsers (VBDis)
13 Sep 2000 20:20:52 -0400

          From comp.compilers

Related articles
Pre-Parsers (Jim Granville) (2000-09-08)
Re: Pre-Parsers (Randall Hyde) (2000-09-09)
Re: Pre-Parsers (2000-09-13)
Re: Pre-Parsers (2000-09-15)
Re: Pre-Parsers (2000-09-21)
Re: Pre-Parsers (Hans-Bernhard Broeker) (2000-10-08)
Re: Pre-Parsers (2000-10-10)
Re: Pre-Parsers (2000-10-12)
Re: Pre-Parsers (2000-10-12)
| List of all articles for this month |

From: (VBDis)
Newsgroups: comp.compilers
Date: 13 Sep 2000 20:20:52 -0400
Organization: AOL Bertelsmann Online GmbH & Co. KG
References: 00-09-065
Keywords: macros, parse

Im Artikel 00-09-065, Jim Granville
<> schreibt:

> I am looking into pre-parsers, esp those that also include
>MACRO capability, with the usual define/ifdef/endif.

Ten years ago I implemented such a program in Basic, to extract declarations
from C header files. AFAIR the implementation was as follows:

First the input is tokenized. Several input streams exist, for every
#include'd source file, and every #define'd macro. A stack of input
streams is used to allow for #include and macro
expansion. Tokenization is required only for source files, the macro
definition streams already contain tokens. Every stream includes a
special symbol table, containing either the current arguments of a
macro invocation, or predefined symbols from the invocation of the

When an escape character (#) is found, the following input is
redirected to the preprocessor and interpreted immediately, until the
whole preprocessor statement is processed. In this state an
interpretation of constant expressions is required, in order to
determine the values of conditional expressions. Global semaphores
are needed e.g. to prevent the processing (delivery) of tokens inside
the FALSE branches of conditional "statements".

When an input token matches a predefined symbol (macro argument...),
it's substituted by the value of that symbol. Within a macro
definition the token is converted into a reference to the according
macro argument, and is substituted again by the current macro
parameter, when the macro is invoked later.

When an input token matches a macro symbol, then the macro arguments
must be parsed into the macro argument list, then input is switched to
the macro definition stream.

Finally the tokens, delivered by the current input stream, are written
either to the output, or to a macro definition table, after
encountering a #define token.

I don't remember whether or how concatenation (## operator in C) was
implemented, here an evaluation of the left and right side may be
necessary, before the resulting strings are merged and tokenized

The whole thing is a state machine, with several stacks for the
states, symbol tables (scopes) for macro definitions and other
symbols, and the input and output streams. Every I/O stream can be
another state machine, which behaves differently according to both the
private and global state. The states reflect the handling of the input
tokens, which can be passed, skipped or evaluated. These states may
be different for several token classes, like ordinary tokens,
preprocessor directives, symbols, constants etc.

Some features can require more processing, like the evaluation of
sizeof(x) in C. In this case all type and variable declarations must
also be stored by the parser, so that the size of every declared
symbol can be evaluated in conditional expressions. At the same time
nested scopes must be implemented, so that the parser can find the
appropriate definition of a symbol within the current nesting of
subroutine declarations etc. Such features are closesly related to a
specific compiler, and that's why you'll never find a "general"
preprocessor for the current C standard. Even the stand-alone
preprocessors, shipped with some C compilers, may be usable for some
general preprocessing, but may fail to produce correct output for a
different C compiler.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.