Re: Parsing partial sentences

George Neuner <>
Tue, 11 Apr 2017 23:05:02 -0400

          From comp.compilers

Related articles
[5 earlier articles]
Re: Parsing partial sentences (Hans-Peter Diettrich) (2017-04-07)
Re: Parsing partial sentences (George Neuner) (2017-04-10)
Re: Parsing partial sentences (Hans-Peter Diettrich) (2017-04-11)
Re: Parsing partial sentences (Martin Ward) (2017-04-11)
Re: Parsing partial sentences (Hans-Peter Diettrich) (2017-04-11)
Re: Parsing partial sentences (Martin Ward) (2017-04-11)
Re: Parsing partial sentences (George Neuner) (2017-04-11)
Re: Parsing partial sentences (Hans-Peter Diettrich) (2017-04-12)
Re: Parsing partial sentences (Hans-Peter Diettrich) (2017-04-20)
Re: Parsing partial sentences (George Neuner) (2017-04-21)
Re: Parsing partial sentences (Walter Banks) (2017-04-27)
Re: Parsing partial sentences (Kaz Kylheku) (2017-04-27)
Re: Parsing partial sentences (Hans-Peter Diettrich) (2017-04-28)
| List of all articles for this month |

From: George Neuner <>
Newsgroups: comp.compilers
Date: Tue, 11 Apr 2017 23:05:02 -0400
Organization: A noiseless patient Spider
References: 17-04-001 17-04-002 17-04-003 17-04-004 17-04-006 17-04-007 17-04-008
Injection-Info:; posting-host=""; logging-data="5073"; mail-complaints-to=""
Keywords: C, parse
Posted-Date: 11 Apr 2017 23:11:52 EDT

On Tue, 11 Apr 2017 10:31:30 +0200, Hans-Peter Diettrich
<> wrote:

>Am 11.04.2017 um 02:22 schrieb George Neuner:
>> You could do it in either LL or LR ... LR would be more runtime
>> efficient, but I think it would not be any easier to create the parser
>> in the first place.
>Perhaps I should clarify: with LL I mean top-down, with LR a bottom-up

I know what you mean by LL and LR. Apart from their Chomsky meanings,
they also refer to canonical implementations.

There may, in fact, be state machine LL and/or recursive descent LR
implementations - but if so, I have neither seen nor heard of them.

In terms of implementation, bottom-up [state machine] tends to be much
more efficient at handling languages that have less structure.

C's syntax is (structured but) far more permissive than Pascal's:
e.g., C allows declarations to be intermixed with other statements,
the only real requirements being that the declaration be in scope of
and preceed any use of the declared object.

A #define body is free form text. There are no syntax rules for
#defines - all that is required (for compilation) is that the body
make sense /in context/ after being substituted into the code.

>> Building on John's example, consider what you'd do with
>> # define FOO +
>> # define BAR + 42
>> # define BAZ + c /* note 'c' is undefined */
>All these can not be reduced into constants or functions. Eventually a
>problem may arise for "+ 42", where the '+' could be interpreted as the
>sign of a constant value.

The point is that the preprocessor doesn't care about the meanings of
tokens in the #define body. Modulo parameter substitution, the
preprocess simply inlines the #define body at the point of use.

>In this case the following code
>> int a, b;
>> :
>> a = a FOO b BAR BAZ;
>would raise a parser error "expecting operator between BAR and BAZ".

After preprocessing, the C compiler will see

      a = a + b + 42 + c;

The compiler will complain about 'c' being undeclared, not about a
missing operator. Try it.

>A more practical example were windows.h. I expect much more than 50%
>named constants in it, which could be detected and converted easily.
>Then this automated handling could reduce the manual inspection and
>classification of many hundreds (thousands?) of #defines to a few
>unhandled or not easily translatable macros.

In the case of Windows: 10s of thousands, counting all the nested

You are undoubtedly correct regarding the majority of headers ... as
John mentioned, constant definitions and real inline functions are
used more often now than "functional" #define macros.

But you still are likely to see weird stuff in older code, and system
and i/o headers in particular often define parameterized macros to
wrap intrinsics and low level library functions.

Macro substitution variables, if any, are local to the #define body -
they are not free references to C variables. So if you see something

# define dumb(x) (x << y)

then you have a bound parameter 'x', a free variable 'y', and an
ambiguous expression of unknown types which must *not* cause an error
during parsing of the macro, but *must* cause an error if the fully
expanded and substituted expression is invalid AT THE POINT OF USE.

You rarely will have any clue regarding the types of variables that
appear in a macro. In the example above, 'x' is a purely mathematical
variable, and the type of 'y' is unknown. In context at the point of
substitution, when the types finally become known, the '<<' operator
may not even be valid.

>Our esteemed moderator wrote:
>[I think you may be trying to solve the wrong problem. Really, you
>cannot preparse C #define statements unless you're prepared to handle
>irregular parse tree fragments. In most compilers, the lexer takes
>vastly more time than the parser, so if you store tokens rather than
>text, you'll get most of the performance. Also, one of the reasons C
>and C++ added const and inline is that they do most of what #define
>does while being normal syntax.
>Here's a thought that sometimes works: try parsing the #define text,
>if it succeeds store the parsed version, otherwise store the text.
>But be prepared for that to fail, e.g.:
>#define FOO a + b
> d = FOO * c;
>be sure that your preparsed FOO doesn't expand into (a+b)*c. That
>may well be what the programmer meant, but it's not what she said.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.