Re: Parsing partial sentences

Kaz Kylheku <>
Thu, 27 Apr 2017 19:08:09 +0000 (UTC)

          From comp.compilers

Related articles
[10 earlier articles]
Re: Parsing partial sentences (Martin Ward) (2017-04-11)
Re: Parsing partial sentences (George Neuner) (2017-04-11)
Re: Parsing partial sentences (Hans-Peter Diettrich) (2017-04-12)
Re: Parsing partial sentences (Hans-Peter Diettrich) (2017-04-20)
Re: Parsing partial sentences (George Neuner) (2017-04-21)
Re: Parsing partial sentences (Walter Banks) (2017-04-27)
Re: Parsing partial sentences (Kaz Kylheku) (2017-04-27)
Re: Parsing partial sentences (Hans-Peter Diettrich) (2017-04-28)
Re: Parsing partial sentences (2017-04-28)
Re: Parsing partial sentences (Marco van de Voort) (2017-04-29)
Re: Parsing partial sentences (Kaz Kylheku) (2017-04-30)
| List of all articles for this month |

From: Kaz Kylheku <>
Newsgroups: comp.compilers
Date: Thu, 27 Apr 2017 19:08:09 +0000 (UTC)
Organization: NNTP Server
References: 17-04-001 17-04-023
Injection-Info:; posting-host=""; logging-data="66276"; mail-complaints-to=""
Keywords: parse
Posted-Date: 27 Apr 2017 21:15:16 EDT

On 2017-04-27, Walter Banks <> wrote:
> On 2017-04-03 3:57 AM, Hans-Peter Diettrich wrote:
>> Is there an easy way to parse e.g. C #defines into constants,
>> functions or other non-terminals, which are not the goal of the
>> entire grammar?
> In a word NO. #defines are always strings even when they look like
> constants, something I have found out the hard way. There have only been
> two ways that I have successfully dealt with #defines: a preprocessor
> pass or later and much faster pipeline the processing of C source and
> add the defined definition processing into part of the source fetch
> handling.

If we allow Pascal to be extended with a macro preprocessor,

I believe I could design a system for translating C to Pascal which
handles some macros, translating them to Pascal macros. Even some
macros that "break" syntactic boundaries, such as "list_for_each (var,
list) { block }".

I don't believe such a project has any value beyond getting
a pat on the back from another developer; I wouldn't spend any
time on such a thing. The end result might well be rejected by some
Pascal users, due to requiring the extended dialect, whether on
ideological grounds, or on practical issues with tooling (being able to
get the preprocessor running in a given Pascal development environment).

Here is a very high level sketch of the approach:

- We preprocess the C translation unit fully before parsing it.
    - However, we use our own specialized C preprocessor which is
        tightly integrated into our translator.
    - Our specialized C preprocessor carefully tracks, in detail
        the origin of every piece of syntax, to the macro which
        substituted that syntax, either as an argument or as body
    - As our parser is analyzing the code, it preserves this information
        in the abstract syntax tree: every tree fragment, if it
        was the result of a macro expansion, is tracked to the
        macro call. We also know that an entire node was the result
        of a macro call, and what that macro call looked like.
    - When we output the Pascal translation, if a tree node was the
        result of a macro call (a macro call that we we were successfully
        able to treat with our magic algoirthms) then rather than outputting
        the Pascal translation of that tree node, we output the macro
        call syntax (remembering that our Pascal dialect has a preprocessor
        to handle that).
    - We have a magic algorithm for reconstructing Pascal versions
        of macro bodies which works roughly like this:
        - When we translate a tree node from C to Pascal, if that tree node
            came from macrology, we keep track of which Pascal fragments
            correspond to C fragments.
        - We then reverse the macrology: we analyze the Pascal and see which
            fragments correspond to C material that came from a macro body,
            and which came from substitution of macro arguments and such.
            - From this we can reconstruct a Pascal macro body, using the
                corresponding Pascal fragments.
                - A Pascal fragment which coresponds to the insertion
                    of some argument X, is just represented by the same X in the
                    Pascal macro body.
                - A Pascal fragment which corresponds to the insertion of
                    some body template material Y is replaced in the Pascal
                    version of the macro by the corresponding Pascal piece.
        - Since a macro call often occurs in more than one place, we can
            somehow combine information from multiple sites to improve the
            translation, or at least validate that the same thing is

    - The saving grace here what makes this even contemplatable is
        that C macros are dumb positional substitution: just pasting
        together of fragments. If C macros were like Lisp macros, doing
        arbitrary Turing computation, good luck with this approach, right?

    - I think, best forget C99 variadic macros and such cruft.

    - The system could have a mode whereby it gives up trying to translate
        a C macro to Pascal, but it at least reverses the syntax so that
        the overall Pascal fragment corresponding to the C code which came
        from that macro call is reversed back to a macro call. A diagnostic
        can be generated "missing macro needed", and the users themselves
        can use their human intelligence to finish the job (if possible).
        Find the C version of the macro and try to translate it.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.