Re: Parser Reversed

"Matt P. Dziubinski" <>
Sun, 11 Mar 2018 15:08:54 +0100

          From comp.compilers

Related articles
Parser Reversed (Hans-Peter Diettrich) (2018-03-11)
Re: Parser Reversed (Matt P. Dziubinski) (2018-03-11)
Re: Parser Reversed (Kaz Kylheku) (2018-03-12)
Re: Parser Reversed (Hans-Peter Diettrich) (2018-03-13)
Re: Parser Reversed (Hans-Peter Diettrich) (2018-03-13)
| List of all articles for this month |

From: "Matt P. Dziubinski" <>
Newsgroups: comp.compilers
Date: Sun, 11 Mar 2018 15:08:54 +0100
References: 18-03-038
Injection-Info:; posting-host=""; logging-data="44798"; mail-complaints-to=""
Keywords: parse, tools, question
Posted-Date: 12 Mar 2018 16:18:26 EDT

On 3/11/2018 08:32, Hans-Peter Diettrich wrote:
> A grammar can be used to *check* for valid sentences of a language, but
> it also can be used to *create* valid sentences. For a pretty printer or
> decompiler test I need a sentence generator for logical expressions. For
> now the language can be restricted to AND, OR, variables and (kind of)
> parentheses. Later on NOT and XOR can be added. RPN is one alternative
> for the "kind of parentheses", eliminating the need for a specific
> operator precedence.
> Now I'm looking for possible implementations of such a generator, in
> addition to my own ideas. So far the output can be anything, e.g. source
> code or machine code, or some tree (AST...).
> Any ideas or references to such projects?


Csmith comes to mind:

Reference: Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. PLDI
2011. "Finding and Understanding Bugs in C Compilers"
LtU post:

Summary (from the paper): "The shape of a program generated by Csmith is
governed by a grammar for a subset of C. A program is a collection of
type, variable, and function definitions; a function body is a block; a
block contains a list of declarations and a list of statements; and a
statement is an expression, control-flow construct (e.g., `if`,
`return`, `goto`, or `for`), assignment, or block. Assignments are
modeled as statementsbnot expressionsbwhich reflects the most common
idiom for assignments in C code. We leverage our grammar to produce
other idiomatic code as well: in particular, we include a statement kind
that represents a loop iterating over an array. The grammar is
implemented by a collection of hand-coded C++ classes."

You may also want to take a look at the following:

* "Effect-Driven QuickChecking of Compilers" (notably, the following
goes substantially further than relying solely on the grammar grammar by
making use of the type system -- more in the paper):

Code (Effect-Driven Compiler Tester):

* "Structure-aware fuzzing for Clang and LLVM with libprotobuf-mutator"
- Kostya Serebryany, Vitaly Buka and Matt Morehouse - 2017 LLVM
Developersb Meeting

In particular:

"This directory contains two utilities for fuzzing Clang: clang-fuzzer
and clang-proto-fuzzer. Both use libFuzzer to generate inputs to clang
via coverage-guided mutation.

The two utilities differ, however, in how they structure inputs to
Clang. clang-fuzzer makes no attempt to generate valid C++ programs and
is therefore primarily useful for stressing the surface layers of Clang
(i.e. lexer, parser). clang-proto-fuzzer uses a protobuf class to
describe a subset of the C++ language and then uses libprotobuf-mutator
to mutate instantiations of that class, producing valid C++ programs in
the process. As a result, clang-proto-fuzzer is better at stressing
deeper layers of Clang and LLVM."

For further reference, perhaps the following compiler correctness
resources (literature & software) can also be of help:


Matt P. Dziubinski

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.