Re: Parsing partial sentences

Hans-Peter Diettrich <>
Thu, 20 Apr 2017 16:14:43 +0200

          From comp.compilers

Related articles
[7 earlier articles]
Re: Parsing partial sentences (Hans-Peter Diettrich) (2017-04-11)
Re: Parsing partial sentences (Martin Ward) (2017-04-11)
Re: Parsing partial sentences (Hans-Peter Diettrich) (2017-04-11)
Re: Parsing partial sentences (Martin Ward) (2017-04-11)
Re: Parsing partial sentences (George Neuner) (2017-04-11)
Re: Parsing partial sentences (Hans-Peter Diettrich) (2017-04-12)
Re: Parsing partial sentences (Hans-Peter Diettrich) (2017-04-20)
Re: Parsing partial sentences (George Neuner) (2017-04-21)
Re: Parsing partial sentences (Walter Banks) (2017-04-27)
Re: Parsing partial sentences (Kaz Kylheku) (2017-04-27)
Re: Parsing partial sentences (Hans-Peter Diettrich) (2017-04-28)
| List of all articles for this month |

From: Hans-Peter Diettrich <>
Newsgroups: comp.compilers
Date: Thu, 20 Apr 2017 16:14:43 +0200
Organization: Compilers Central
References: <> 17-04-011
Injection-Info:; posting-host=""; logging-data="44602"; mail-complaints-to=""
Keywords: parse
Posted-Date: 20 Apr 2017 10:34:57 EDT

Am 11.04.2017 um 11:30 schrieb Martin Ward:
> On 07/04/17 21:24, Hans-Peter Diettrich wrote:

> On 07/04/17 21:24, John wrote:
>> Since preprocessor macros are text macros, there's no reason to
>> expect a macro's expansion can be parsed at all.
> This is true: handling the full generality of what is possible
> with #define macros without first expanding all the macros
> is (almost) impossible.

After thinking more about macro expansion, interpretation of the macro
text should occur only when a macro is actually used, so that all
referenced identifiers (other macros, parameters, variables, functions)
are known to the parser/compiler. At that point the returned
non-terminal from the expanded macro can be used immediately to
determine a possible macro conversion.

Some attributes should be added to the parse tree, so that it will be
possible to determine the macro name for every token, if it results from
a macro expansion, and the kind, type and scope of all used identifiers.
As long as only literal constants occur in the parse tree, the macro can
be converted into a named constant[1]. If identifiers occur as well,
more checks are required to reject e.g. identifiers of non-global scope,
before the macro can be converted into a function[2].
If all tests are passed, the macro equivalent can be constructed and
stored, for later use. If such an equivalent has already been stored,
the new construct has to be compared with that first construct, to sort
out possible variations due to macro redefinition or varying macro
argument types. It may be possible to handle varying argument types by a
transformation into generic functions, or by type expansion.

Provisions also are required (and already implemented) to prevent
duplicate parsing of properly guarded header files, which otherwise
could confuse the automatism about preceding expansions of macros.

Did I miss something?

[1] Since macros can be defined in any order, it's not guaranteed that
all used identifiers have already been defined or declared before.
That's why their classification has to be delayed, at least until all
macros have been defined, e.g. at the end of a header file.

As long as all uses (actual expansions) of a macro have to be checked,
an additional check for constant definitions seems to be abdicable. OTOH
its desireable to also convert header files into the target language,
for later immediate use. Since typically no macro expansion occurs in
header files, and even in a source module not all macros are used, a
special scan for constant macros still looks very helpful to me.
Eventually a pseudo module can be constructed for every header file,
containing a function that uses all macros (without arguments?) which
are #defined in the header. The parser may have to be modified for that
purpose, to skip over macros which do not expand immediately into valid
source code.

[2] Eventually even local variables may become acceptable in converted
functions, when the resulting function becomes local to the scope of the
variables and invoking function. Here OPL puts a limit onto
"localization", because functions can be local only to another function,
not to arbitrary code blocks. Similarly local variables can only be
local to a function, not to code blocks. This case already is handled by
the convertor.

> However, in practice it depends on the purpose of your C to Pascal
> convertor. If it is to be a general purpose tool which has to be
> able to handle any C source code that anyone throws at it,
> then you are stuck.

See above, my convertor only accepts valid C code. In so far the user
can throw any valid C source file at it. It's only required that the
original compiler and its libraries are made known to the convertor, so
that all compiler-predefined macros are known, and all #included header
files are available for parsing.

This is not really different from the usual invocation of a compiler,
which must be given a search path to all used header files. A couple of
compiler specific extensions over the C standard are already
implemented, like the strange macro expansion of '/'##'/' into "//",
found in MSVC, or #include_next in gcc.

> An intermediate approach might be to keep all the macros
> which look straightforward and fall back on expanding macros
> which do weird things with the syntax.

That's the current implementation, which does not yet handle translated
#defines at all.

Thanks for all the inspiration on my project :-)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.