Re: Test Generation Via Randomized Macro Expansion

hermann@dcc.ufmg.br (Hermann Oliveira Rodrigues)
15 May 2000 23:51:46 -0400

          From comp.compilers

Related articles
Test Generation Via Randomized Macro Expansion rfg@monkeys.com (Ronald F. Guilmette) (2000-05-12)
Re: Test Generation Via Randomized Macro Expansion hermann@dcc.ufmg.br (2000-05-15)
| List of all articles for this month |
From: hermann@dcc.ufmg.br (Hermann Oliveira Rodrigues)
Newsgroups: comp.compilers
Date: 15 May 2000 23:51:46 -0400
Organization: POP-MG
References: 00-05-055
Keywords: macros, comment

Ronald F. Guilmette (rfg@monkeys.com) wrote:
: A few years ago, I implemented a specialized system (TGGS) whose purpose
: was to generate randomized but syntatically valid test cases for compilers
: of various languages.


: It occured to me the other day that this sort of generation of (randomized)
: text obeying various context-free syntax rules could be construed as just
: an exercise in randomized macro expansion.


: Given this realization, I set out to create an enhanced version of the
: GNU m4 macro processor which would have in-built capabilities for
: performing randomized macro expansion. My goal was to create a
: version of GNU m4 which would make it easy to use m4 to generate
: randomized texts obeying various specified context-free syntax rules.
: I believe that I succeded. The results of my little effort are
: described below.


Some time ago, an instructor gave us an interesting challenge:
Implement a toy compiler for a toy language (called PLM2) only using
the m4 macro processor. The toy language had common features like,
if-then-else-endif, while-do-whileend and attribution. The compiler
should be able to identify constants and variables and allocate the
required memory for it in the assembly output file. Variable did not
need to be declared and the compiler should identify the first time
that a variable appear in the program.


The compiler generates a text file consisting of an assembly
description for the target machine, so an assembler could process it
pretty easy.


A toy example program follows:


//---------------------------------------------------//
// Program in PLM2 to multiply positive integers. //
//---------------------------------------------------//


read x;
read y;


// Test for input validation.
test1 is x <= 0;
test2 is y <= 0;
if test1 or test2 then
// The default output.
print 0;
exit;
else
// Input is valid.
endif.


// Do the multiplication.
par1 is x;
par2 is y;
call multiply;
print result;


// Multiplication procedure definition.
procedure multiply as
result is 0;
while par2 > 0 do
result is result + par1;
par2 is par2 - 1;
endwhile.
procedure$


//---------------------------------------------------


The compiler output looks like this (the labels are in
portuguese, sorry):


INP x
INP y
LDA x
SUB __var7
JZE __MENOR_IGUAL1
JLZ __MENOR_IGUAL1
LDA __ZERO
JMP __FIM_MENOR_IGUAL1
__MENOR_IGUAL1:
LDA __NAO_ZERO
__FIM_MENOR_IGUAL1:
JZE __SENAO1
OUT __var0
HLT
JMP __FIMSE1
__SENAO1:
__FIMSE1:


...


x: DC 0
y: DC 0
result: DC 0
par1: DC 0
par2: DC 0


...


All the given requirements were filled except the nesting of
control structures due parsing restrictions. The statements are
recognized by m4 using regular expressions and replaced by a macro
that `knows' how to define that statement.


for example,


if x <= 2 then
print 3;
exit;
else
print 2;
endif.


becomes


_if_(x < 2, print 3; exit;, print 2)


But we know that regular expressions can't be used to parse nested
patterns.


for example


if x <= 2 then
print 3;
else
if y > 4 then
print 1;
else
print 2;
endif
endif


becomes


if (x < 2 then print 3; else if y < 4, print 1;, print 2 endif)


This is not what we expected.


To keep the pattern matching under control, we add a `.' to
the structure termination and put that restriction in the pattern. So
the `expression', `then' and `else' contents can not include the `.'.
Unfortunately this killed any nested control structure detection hope.


I had a final idea to implement, in the m4, a new scanner that
could process the file, token by token, and put a suffix in the control
structures identifying their nesting level and which keywords could match
it. The keyword matching should be a `random' tag generated each time that
a new structure beginning was found.


The previous definitions should look as follow after the initial
preprocessing.


if_0_234111233 x <= 2 then_0_234111233
print 3;
else_0_234111233
if_1_45664212 y > 4 then_1__45664212
print 1;
else_1_45664212
print 2;
endif_1_45664212
endif_0_234111233


This should give the m4 processor enough information about the
structures to avoid wrong matchings.


Unfortunately the m4 did not have a random number generator in
that time and our idea could not be implemented.


To finish the whole history, we lost the challenge for one
item. But the instructor understand that we found a technical
limitation and gave us the merit and the points anyway.


Now, using the random generator implemented to the m4, the
next crazy students that accept the instructor challenges can complete
the tasks better than we could.


At least, I hope. :)


-----------------------------------------------------------
        Hermann Oliveira Rodrigues - hermann@dcc.ufmg.br
                Computer Science Student at UFMG - Brazil
[In 1972 as an undergrad, I wrote a compiler from a subset of APL to Basic
in Trac, a macrogenerator similar to M4. It worked pretty well, tried to
be smart about not generating matrix temporary values. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.