Re: Supporting multiple input syntaxes

Kaz Kylheku <>
Thu, 13 Aug 2020 00:43:47 +0000 (UTC)

          From comp.compilers

Related articles
Supporting multiple input syntaxes (luser droog) (2020-08-12)
Re: Supporting multiple input syntaxes (Kaz Kylheku) (2020-08-13)
Re: Supporting multiple input syntaxes (Hans-Peter Diettrich) (2020-08-13)
Re: Supporting multiple input syntaxes (2020-08-13)
Re: Supporting multiple input syntaxes (luser droog) (2020-08-13)
Re: Supporting multiple input syntaxes (luser droog) (2020-08-13)
Supporting multiple input syntaxes (David Lovemore) (2020-08-15)
Re: Supporting multiple input syntaxes (luser droog) (2020-08-15)
[10 later articles]
| List of all articles for this month |

From: Kaz Kylheku <>
Newsgroups: comp.compilers
Date: Thu, 13 Aug 2020 00:43:47 +0000 (UTC)
Organization: NNTP Server
References: 20-08-002
Injection-Info:; posting-host=""; logging-data="68682"; mail-complaints-to=""
Keywords: C, parse
Posted-Date: 13 Aug 2020 18:22:32 EDT

On 2020-08-12, luser droog <> wrote:
> I've got my project successfully parsing the circa-1975 C syntax
> from that old manual. I'd like to add parsers for K&R1 and c90
> syntaxes.
> How separate should these be? Should they be complete
> separate grammars, or more piecewise selection?
> My feeling is that separating them will be less headache, but maybe
> there's some advantage to changing out smaller pieces of the grammar
> in that it might be easier to make sure that they produce the same
> structure compatible with the backend.
> Any guidance in this area?

I'd say that since you're not using a parser generator, but using code
statements to construct the grammar objects at initialization time, you
have the flexibility to merge the implementation, because you can check
the value of some dialect-selecting variable, and construct the parser
accordingly, and elsewhere check that same variable to do whatever else
needs to be done conditionally.

The trick is to find a way to embed the *semantics* of the older dialects
into the new so then everything after the parsing can be shared.

Similar remarks would apply to recursive descent.

If you were using something clunky like a Yacc, there are still ways
to combine everything into a single grammar. The input stream can be
primed with one of several "secret internal token" objects that has no
lexeme. (Primed, meaming that the first call to the lexer yields this
secret token instead of processing actual input.) Each token indicates
a dialect to parse. The top-level grammar production can then pick
one of several subordinate production rules corresponding to the entry
points for the respective dialects. Those can then share common rules
as much as possible.

    translation_unit : C75_TOKEN c75_translation_unit /* orig flavor */
                                      | C79_TOKEN c79_translation_unit /* "K&R" */
                                      | C90_TOKEN c90_translation_unit /* ANSI/ISO */

TXR Programming Lanuage:
Music DIY Mailing List:
ADA MP-1 Mailing List:

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.