Test Generation Via Randomized Macro Expansion

"Ronald F. Guilmette" <rfg@monkeys.com>
12 May 2000 22:31:42 -0400

          From comp.compilers

Related articles
Test Generation Via Randomized Macro Expansion rfg@monkeys.com (Ronald F. Guilmette) (2000-05-12)
Re: Test Generation Via Randomized Macro Expansion hermann@dcc.ufmg.br (2000-05-15)
| List of all articles for this month |

From: "Ronald F. Guilmette" <rfg@monkeys.com>
Newsgroups: comp.compilers
Date: 12 May 2000 22:31:42 -0400
Organization: Compilers Central
Keywords: macros, available

A few years ago, I implemented a specialized system (TGGS) whose purpose
was to generate randomized but syntatically valid test cases for compilers
of various languages.


It occured to me the other day that this sort of generation of (randomized)
text obeying various context-free syntax rules could be construed as just
an exercise in randomized macro expansion.


Given this realization, I set out to create an enhanced version of the
GNU m4 macro processor which would have in-built capabilities for
performing randomized macro expansion. My goal was to create a
version of GNU m4 which would make it easy to use m4 to generate
randomized texts obeying various specified context-free syntax rules.
I believe that I succeded. The results of my little effort are
described below.


++++++++++++++


In a nutshell, the m4 extension I have implemented allows for any given
macro to be assigned multime alternative definitions. For example:


define(`A',`B',`C',`D')


Here, `B', `C', and `D' are all alternative definitions of the macro `A'.


Given such a definition, each time the `A' macro is invoked, m4 will
randomly select one of these alternative definitions as the one to use
for that particular expansion of `A'.


This new feature allows for the generation of randomized text subject to
syntatic constraints. For example, the input file:


define(`noun',`cats',`dogs',`people')dnl
define(`verb',`love',`hate',`ignore')dnl
noun verb noun


when given to the (enhanced) GNU m4 macro processor, would yield randomized
sentences such as ``cats hate dogs'', ``people ignore cats'' etc.


Within m4, the random number sequence used as a basis for making the random
selections (of competing macro definitions) is obtained from the random(3)
library function (available, at least, on BSD-ish and Linux systems). This
function provides a single consistant and repeatable random number sequence
unless its behavior is modified by changing its seed, an operation which is
performed via the srandom(3) library function.


In order to allow for alternative seeding of the random(3) random number
generator, the extensions implemented by the patches below include the
addition (to m4) of a new built-in macro named `srandom'. This macro may
be called with either zero or one arguments. If called with one argument,
that argument must be a `number' (according to the pre-existing lexical
rules for `numbers' implemented within GNU m4). The number supplied is
converted to binary form and supplied (by m4) in a call to the library
srandom(3) function, thus seeding the random(3) generator. If called with
zero arguments, the new m4 built-in `srandom' will obtain a pseudo-random
number on its own, and will supply that in a call to the library srandom(3)
function to seed the random(3) generator.


Additionally, another new m4 built-in is implemented by the patches below,
i.e. the `randnum' built-in. This macro takes zero arguments and it
expands to a single 32-bit pseudo-random number. The pseudo-random number
generated by a call to this macro is obtained either (a) by reading 4 bytes
of data from /dev/urandom (on systems where such a device file exists, e.g.
BSD-ish and Linux systems) or else by repeatedly calling gettimeofday(2)
and shifting the lowest-order bit of the tv_usec (microseconds) field of
the result into an accumulator until a full 32-bit random number is pro-
duced. (The resulting number is likely to be highly random.) Note that:


srandom`'dnl


is exactly equivalent to:


srandom(randnum)dnl


Randomized generation of text, subject to syntatic constraints, has poten-
tially broad applications, including (but certainly not limited to) the
generation of input test cases for various types of software. Various
programs supporting syntatically-rich input, in particular, may benefit
from testing with automatically-generated inputs that obey syntatic con-
straints but which are otherwise randomized. For example, the following
m4 script, when supplied to the (modified) GNU m4 macro processor, gener-
ates syntatically valid but randomized texts within a subset of the input
language supported by the bc(1) bench calculator program available on many
variants of UNIX:


define(`digit',`0',`1',`2',`3',`4',`5',`6',`7',`8',`9')dnl
define(`digits',`digit',`digit`'digits')dnl
define(`literal',`digits')dnl
define(`atom',`literal',`( expression )')dnl
define(`primary',`atom',`primary ^ atom')dnl
define(`factor',`primary',`factor * atom',`factor / atom')dnl
define(`term',`factor',`term + factor',`term - factor')dnl
define(`expression',`term')dnl
srandom`'dnl
expression


(The direct correspondance between the m4 input shown above and the BNF
syntax rules for the equivalent subset of the bc(1) input language will
undoubtedly be noted by those accustomed to dealing with BNF specifica-
tions of language syntax.)


Due to the common occurance of self-recursive syntax rules within the
context-free descriptions of many languages (including the subset of
the bc(1) input language illustrated above) a special mechanism has been
implemented (as part of these extension to GNU m4) which has the effect
of helping to suppress ``runaway recursion'' during the expansion of
macros having mutiple alternative definitions. Quite simply, whenever a
given alternative definition of a given macro is selected (quasi-randomly)
as the expansion text to be used for a given call of the macro in question,
internal logic reduces (slightly) the probability of that alternative
definition being selected (again) as the operative expansion text for a
subsequent call to the same macro... at least which the first-order ex-
pansion text of the given macro is still being processed/rescanned by m4.
This automatic/temporary ``depreciation'' of selected alternatives effec-
tively prevents limitless ``runaway'' recursive macro expansions, at least
in cases where each macro definition provides one (or more) alternative
definitions that are non-recursive.


The patches below are relative to a virgin source tree for GNU m4 version
1.4. (See www.gnu.org.) Build problems may arise if an attempt is made
to build the modified GNU m4 sources on any system which fails to include
random(3), srandom(3), or gettimeofday(2) in the default libc library.




Ron Guilmette
Roseville, California




P.S. Extension of this basic idea to include provisions for enforcing
semantic constraints on the generated output (in addition to simple context-
free syntatic constraints) is left as an exercise for the reader.






cut here for GNU m4 (v1.4) patches
---------------------------------------------------------------------------
diff -rc2 ./src/builtin.c ./src/builtin.c
*** ./src/builtin.c Wed Aug 31 09:45:12 1994
--- ./src/builtin.c Thu May 11 10:31:58 2000
***************
*** 21,24 ****
--- 21,26 ----


    #include "m4.h"
+ #include <sys/time.h>
+ #include <fcntl.h>


    extern FILE *popen ();
***************
*** 69,75 ****
--- 71,79 ----
    DECLARE (m4_popdef);
    DECLARE (m4_pushdef);
+ DECLARE (m4_randnum);
    DECLARE (m4_regexp);
    DECLARE (m4_shift);
    DECLARE (m4_sinclude);
+ DECLARE (m4_srandom);
    DECLARE (m4_substr);
    DECLARE (m4_syscmd);
***************
*** 123,129 ****
--- 127,135 ----
        { "popdef", FALSE, FALSE, TRUE, m4_popdef },
        { "pushdef", FALSE, TRUE, TRUE, m4_pushdef },
+ { "randnum", TRUE, FALSE, FALSE, m4_randnum },
        { "regexp", TRUE, FALSE, TRUE, m4_regexp },
        { "shift", FALSE, FALSE, FALSE, m4_shift },
        { "sinclude", FALSE, FALSE, TRUE, m4_sinclude },
+ { "srandom", TRUE, TRUE, FALSE, m4_srandom },
        { "substr", FALSE, FALSE, TRUE, m4_substr },
        { "syscmd", FALSE, FALSE, TRUE, m4_syscmd },
***************
*** 190,193 ****
--- 196,203 ----


        sym = lookup_symbol (name, mode);
+ SYMBOL_NDEFS (sym) = 1;
+ SYMBOL_DEFS (sym) = (token_data *) xmalloc (sizeof (token_data));
+ SYMBOL_WEIGHT (sym) = (unsigned *) xmalloc (sizeof (unsigned));
+ SYMBOL_WEIGHT (sym)[0] = initial_weight;
        SYMBOL_TYPE (sym) = TOKEN_FUNC;
        SYMBOL_MACRO_ARGS (sym) = bp->groks_macro_args;
***************
*** 199,217 ****
    /*-------------------------------------------------------------------------.
    | Define a predefined or user-defined macro, with name NAME, and expansion |
! | TEXT. MODE destinguishes between the "define" and the "pushdef" case. |
    | It is also used from main (). |
    `-------------------------------------------------------------------------*/


    void
! define_user_macro (const char *name, const char *text, symbol_lookup mode)
    {
        symbol *s;


        s = lookup_symbol (name, mode);
        if (SYMBOL_TYPE (s) == TOKEN_TEXT)
! xfree (SYMBOL_TEXT (s));


! SYMBOL_TYPE (s) = TOKEN_TEXT;
! SYMBOL_TEXT (s) = xstrdup (text);
    }


--- 209,238 ----
    /*-------------------------------------------------------------------------.
    | Define a predefined or user-defined macro, with name NAME, and expansion |
! | DEFS. MODE destinguishes between the "define" and the "pushdef" case. |
    | It is also used from main (). |
    `-------------------------------------------------------------------------*/


    void
! define_user_macro (const char *name, unsigned ndefs, token_data *defs, symbol_lookup mode)
    {
        symbol *s;
+ unsigned old_ndefs;
+ unsigned i;


        s = lookup_symbol (name, mode);
+ old_ndefs = SYMBOL_NDEFS (s);
        if (SYMBOL_TYPE (s) == TOKEN_TEXT)
! {
! for (i = 0; i < old_ndefs; i++)
! xfree (SYMBOL_TEXT (s, i));
! xfree (SYMBOL_DEFS (s));
! xfree (SYMBOL_WEIGHT (s));
! }


! SYMBOL_NDEFS (s) = ndefs;
! SYMBOL_WEIGHT (s) = (unsigned *) xmalloc (ndefs * sizeof (unsigned));
! for (i = 0; i < ndefs; i++)
! SYMBOL_WEIGHT (s)[i] = initial_weight;
! SYMBOL_DEFS (s) = defs;
    }


***************
*** 241,254 ****


        for (pp = &predefined_tab[0]; pp->func != NULL; pp++)
! if (no_gnu_extensions)
! {
! if (pp->unix_name != NULL)
! define_user_macro (pp->unix_name, pp->func, SYMBOL_INSERT);
! }
! else
! {
! if (pp->gnu_name != NULL)
! define_user_macro (pp->gnu_name, pp->func, SYMBOL_INSERT);
! }
    }


--- 262,289 ----


        for (pp = &predefined_tab[0]; pp->func != NULL; pp++)
! {
! token_data *p;
!
! if (no_gnu_extensions)
! {
! if (pp->unix_name != NULL)
! {
! p = (token_data *) xmalloc (sizeof (token_data));
! TOKEN_DATA_TYPE (p) = TOKEN_TEXT;
! TOKEN_DATA_TEXT (p) = xstrdup (pp->func);
! define_user_macro (pp->unix_name, 1, p, SYMBOL_INSERT);
! }
! }
! else
! {
! if (pp->gnu_name != NULL)
! {
! p = (token_data *) xmalloc (sizeof (token_data));
! TOKEN_DATA_TYPE (p) = TOKEN_TEXT;
! TOKEN_DATA_TEXT (p) = xstrdup (pp->func);
! define_user_macro (pp->gnu_name, 1, p, SYMBOL_INSERT);
! }
! }
! }
    }


***************
*** 407,412 ****
    {
        const builtin *bp;


! if (bad_argc (argv[0], argc, 2, 3))
            return;


--- 442,448 ----
    {
        const builtin *bp;
+ token_data *p;


! if (bad_argc (argv[0], argc, 2, no_gnu_extensions ? 3 : -1))
            return;


***************
*** 416,427 ****
        if (argc == 2)
            {
! define_user_macro (ARG (1), "", mode);
                return;
            }


        switch (TOKEN_DATA_TYPE (argv[2]))
            {
            case TOKEN_TEXT:
! define_user_macro (ARG (1), ARG (2), mode);
                break;


--- 452,486 ----
        if (argc == 2)
            {
! p = (token_data *) xmalloc (sizeof (token_data));
! TOKEN_DATA_TYPE (p) = TOKEN_TEXT;
! TOKEN_DATA_TEXT (p) = xstrdup ("");
! define_user_macro (ARG (1), 1, p, mode);
                return;
            }


+ /* We assume here that either (a) argv points to a list of pointers which
+ all point to things having type TOKEN_TEXT, or else (b) argv points to
+ a single pointer which points to a thing having type TOKEN_FUNC. Mixing
+ these two types of things in argv is assumed not to occur. */
+
        switch (TOKEN_DATA_TYPE (argv[2]))
            {
            case TOKEN_TEXT:
! {
! token_data *defs =
! (token_data *) xmalloc ((argc - 2) * sizeof (token_data));
! token_data *const *inp;
! token_data *const *const limit = &argv[argc];
! token_data *outp;
!
! for (inp = &argv[2], outp = defs; inp < limit; inp++, outp++)
! {
! /* Copy the TOKEN_TEXT structure. */
! *outp = **inp;
! /* Make a duplicate copy of the actual text. */
! TOKEN_DATA_TEXT (outp) = xstrdup (TOKEN_DATA_TEXT (*inp));
! }
! define_user_macro (ARG (1), argc - 2, defs, mode);
! }
                break;


***************
*** 618,621 ****
--- 677,682 ----
        for (; data.size > 0; --data.size, data.base++)
            {
+ unsigned ndefs, i;
+
                DEBUG_PRINT1 ("%s:\t", SYMBOL_NAME (data.base[0]));


***************
*** 623,631 ****
     {
     case TOKEN_TEXT:
! if (debug_level & DEBUG_TRACE_QUOTE)
! DEBUG_PRINT3 ("%s%s%s\n",
! lquote.string, SYMBOL_TEXT (data.base[0]), rquote.string);
! else
! DEBUG_PRINT1 ("%s\n", SYMBOL_TEXT (data.base[0]));
     break;


--- 684,697 ----
     {
     case TOKEN_TEXT:
! ndefs = SYMBOL_NDEFS (data.base[0]);
! for (i = 0; i < ndefs; i++)
! {
! if (debug_level & DEBUG_TRACE_QUOTE)
! DEBUG_PRINT3 ("%s%s%s ",
! lquote.string, SYMBOL_TEXT (data.base[0], i), rquote.string);
! else
! DEBUG_PRINT1 ("%s ", SYMBOL_TEXT (data.base[0], i));
! }
! DEBUG_PRINT1 ("\n", "");
     break;


***************
*** 698,701 ****
--- 764,834 ----
    }


+ static unsigned long
+ randnum (void)
+ {
+ unsigned long randval;
+ int fd;
+ char *pathname = "/dev/urandom";
+
+ /* Try using /dev/urandom first, if available. */
+ if ((fd = open (pathname, O_RDONLY)) != -1)
+ {
+ if (read (fd, &randval, sizeof randval) == -1)
+ M4ERROR ((EXIT_FAILURE, errno, "Error reading %s", pathname));
+ close (fd);
+ }
+ else /* Create a crappy (but fast) approximation to a random number. */
+ {
+ struct timeval tv;
+ int i;
+ unsigned long rand1 = 0;
+ unsigned long rand2 = 0;
+
+ for (i = 0; i < 32; i++)
+ {
+ gettimeofday (&tv, NULL);
+ rand1 = (rand1 << 1) | (tv.tv_usec & 1);
+ }
+ for (i = 0; i < 32; i++)
+ {
+ gettimeofday (&tv, NULL);
+ rand2 = (rand2 << 1) | (tv.tv_usec & 1);
+ }
+ randval = rand1 ^ rand2;
+ }
+ return randval;
+ }
+
+ static void
+ m4_randnum (struct obstack *obs, int argc, token_data **argv)
+ {
+ unsigned randval;
+
+ if (bad_argc (argv[0], argc, 1, 1))
+ return;
+ randval = randnum ();
+ shipout_int (obs, (int) randval);
+ }
+
+ static void
+ m4_srandom (struct obstack *obs, int argc, token_data **argv)
+ {
+ int value;
+ unsigned seed;
+
+ if (bad_argc (argv[0], argc, 1, 2))
+ return;
+
+ if (argc == 1)
+ seed = randnum ();
+ else
+ {
+ if (!numeric_arg (argv[0], ARG (1), &value))
+ return;
+ seed = (unsigned) value;
+ }
+ srandom (seed);
+ }
+
    /*-------------------------------------------------------------------------.
    | The macro "defn" returns the quoted definition of the macro named by the |
***************
*** 708,711 ****
--- 841,846 ----
    {
        symbol *s;
+ unsigned ndefs;
+ unsigned i;


        if (bad_argc (argv[0], argc, 2, 2))
***************
*** 719,724 ****
            {
            case TOKEN_TEXT:
                obstack_grow (obs, lquote.string, lquote.length);
! obstack_grow (obs, SYMBOL_TEXT (s), strlen (SYMBOL_TEXT (s)));
                obstack_grow (obs, rquote.string, rquote.length);
                break;
--- 854,869 ----
            {
            case TOKEN_TEXT:
+ ndefs = SYMBOL_NDEFS (s);
+ i = 0;
+ if (ndefs > 1)
+ for (; i < ndefs - 1; i++)
+ {
+ obstack_grow (obs, lquote.string, lquote.length);
+ obstack_grow (obs, SYMBOL_TEXT (s, i), strlen (SYMBOL_TEXT (s, i)));
+ obstack_grow (obs, rquote.string, rquote.length);
+ obstack_grow (obs, ',', 1);
+ }
                obstack_grow (obs, lquote.string, lquote.length);
! obstack_grow (obs, SYMBOL_TEXT (s, i), strlen (SYMBOL_TEXT (s, i)));
                obstack_grow (obs, rquote.string, rquote.length);
                break;
***************
*** 1672,1675 ****
--- 1817,1892 ----
    }


+ /* Here we make a quasi-random choice among all of the various alternative
+ definitions of the macro specified by `sym'. The value we return is a
+ index value, between 0 and SYMBOL_NDEFS (sym), inclusive.
+
+ The choice we make is not entirely random, but may instead be tempered
+ somewhat by the quasi-random choices that we have made previously when
+ selecting from among the available alternative definitions of the same
+ macro.
+
+ All alternatives are initially created equal, with an equal probability
+ of being selected, however that changes as we select alternatives. Each
+ time we select a given alternative for a given macro call, we ``depreciate''
+ that alternative, i.e. we make it slightly less likely that the same
+ alternative will be selected again, for a subsequent expansion of the
+ same macro, at least during the time that we are still processing and
+ rescanning the expansion text for the given macro call. When we have
+ completely finished processing/rescanning the expansion text for the
+ given macro call, we effectively undo the depreciation of the selected
+ alternative definition that we performed earlier, when we first expanded
+ the macro. (In effect we ``re-appreciate'' the relevant alternative macro
+ definition again.)
+
+ This process of depreciating (and subsequently re-appreciating) the
+ specific macro definition alternatives that we select (quasi-randomly)
+ for actual expansion solves the problem of runaway recursive macro
+ expansion. For example, within this depreciation/appreciation mechan-
+ ism, the following macro definition, when expanded, would very likely
+ result in runaway recursive macro expansion:
+
+ define(`A',`B',`A A A')
+
+ Fortunately however, the depreciation/appreciation mechanism insures
+ that during any given expansion of `A', each alternative definition of
+ this macro that is actually selected for use becomes a depreciated
+ (slightly less probable) alternative for subsequent expansions of `A'.
+ Thus, to the extent that the `A A A' alternative is actually selected
+ for use in the given expansion of `A', we make it slightly less likely
+ that this same alternative will be selected again, for subsequent ex-
+ pansions of `A' that we may need to perform while we are still proces-
+ sing and rescanning the original first-order expansion of `A'. (The
+ change in weighting of the alternatives is effectively un-done and
+ disappears after we finish processing/rescanning the expansion text
+ for the relevant macro call however, thus restoring whatever weightings
+ applied to the various alternatives prior to the relevant expansion.)
+
+ The `random_choice' function below is responsible only for making the
+ weighted quasi-random selections from among competing alternative defi-
+ nitions of a given macro. The functions that perform the depreciation
+ and subsequent re-appreciation of a selected alternative live over in
+ the input.c file, and are called, appropriately enough `depreciate' and
+ `appreciate'.
+ */
+
+ static unsigned
+ random_choice (symbol *sym)
+ {
+ register unsigned ndefs = SYMBOL_NDEFS (sym);
+ register unsigned weights_sum;
+ register unsigned r, i;
+
+ if (ndefs == 0)
+ abort (); /* Should never happen. */
+ if (ndefs == 1)
+ return 0; /* Shortcut. */
+ for (weights_sum = 0, i = 0; i < ndefs; i++)
+ weights_sum += SYMBOL_WEIGHT (sym)[i];
+ r = ((unsigned long) random ()) % weights_sum;
+ for (weights_sum = 0, i = 0; i < ndefs; i++)
+ if (r < (weights_sum += SYMBOL_WEIGHT (sym)[i]))
+ return i;
+ }
+
    /*-------------------------------------------------------------------------.
    | This function handles all expansion of user defined and predefined |
***************
*** 1686,1691 ****
        register const char *text;
        int i;


! for (text = SYMBOL_TEXT (sym); *text != '\0';)
            {
                if (*text != '$')
--- 1903,1914 ----
        register const char *text;
        int i;
+ int r = random_choice (sym);
+ extern symbol *current_macro;
+ extern unsigned current_defnum;
+
+ current_macro = sym;
+ current_defnum = r;


! for (text = SYMBOL_TEXT (sym, r); *text != '\0';)
            {
                if (*text != '$')
diff -rc2 ./src/freeze.c ./src/freeze.c
*** ./src/freeze.c Wed Nov 2 08:17:03 1994
--- ./src/freeze.c Wed May 10 13:05:05 2000
***************
*** 98,109 ****
                for (sym = symtab[h]; sym; sym = SYMBOL_NEXT (sym))
     {
     switch (SYMBOL_TYPE (sym))
     {
     case TOKEN_TEXT:
! fprintf (file, "T%d,%d\n",
! (int) strlen (SYMBOL_NAME (sym)),
! (int) strlen (SYMBOL_TEXT (sym)));
     fputs (SYMBOL_NAME (sym), file);
! fputs (SYMBOL_TEXT (sym), file);
     fputc ('\n', file);
     break;
--- 98,115 ----
                for (sym = symtab[h]; sym; sym = SYMBOL_NEXT (sym))
     {
+ unsigned ndefs = SYMBOL_NDEFS (sym);
+ unsigned i;
+
     switch (SYMBOL_TYPE (sym))
     {
     case TOKEN_TEXT:
! fprintf (file, "K%d\n", 1 + ndefs);
! fprintf (file, "T%d", (int) strlen (SYMBOL_NAME (sym)));
! for (i = 0; i < ndefs; i++)
! fprintf (file, ",%d", (int) strlen (SYMBOL_TEXT (sym, i)));
! fprintf (file, "\n");
     fputs (SYMBOL_NAME (sym), file);
! for (i = 0; i < ndefs; i++)
! fputs (SYMBOL_TEXT (sym, i), file);
     fputc ('\n', file);
     break;
***************
*** 175,182 ****
        int character;
        int operation;
! char *string[2];
! int allocated[2];
! int number[2];
        const builtin *bp;


    #define GET_CHARACTER \
--- 181,190 ----
        int character;
        int operation;
! char **string;
! int arg_count = 2;
! int *allocated;
! int *number;
        const builtin *bp;
+ int i;


    #define GET_CHARACTER \
***************
*** 207,210 ****
--- 215,222 ----
            M4ERROR ((EXIT_FAILURE, errno, "Cannot open %s", name));


+ string = (char **) xmalloc (arg_count * sizeof (char *));
+ allocated = (int *) xmalloc (arg_count * sizeof (int));
+ number = (int *) xmalloc (arg_count * sizeof (int));
+
        allocated[0] = 100;
        string[0] = xmalloc ((size_t) allocated[0]);
***************
*** 234,241 ****
     break;


                case 'C':
                case 'D':
                case 'F':
- case 'T':
                case 'Q':
     operation = character;
--- 246,310 ----
     break;


+ case 'K':
+ for (i = 0; i < arg_count; i++)
+ xfree (string[i]);
+ xfree (string);
+ xfree (allocated);
+ xfree (number);
+ GET_CHARACTER;
+ GET_NUMBER (arg_count);
+ VALIDATE ('\n');
+ string = (char **) xmalloc (arg_count * sizeof (char *));
+ allocated = (int *) xmalloc (arg_count * sizeof (int));
+ number = (int *) xmalloc (arg_count * sizeof (int));
+ for (i = 0; i < arg_count; i++)
+ string[i] = xmalloc ((size_t) (allocated[i] = 100));
+ break;
+
+ case 'T':
+ GET_CHARACTER;
+
+ /* Get string lengths. */
+ GET_NUMBER (number[0]);
+ for (i = 1; i < arg_count; i++)
+ {
+ VALIDATE (',');
+ GET_CHARACTER;
+ GET_NUMBER (number[i]);
+ }
+ VALIDATE ('\n');
+
+ for (i = 0; i < arg_count; i++)
+ {
+ if (number[i] + 1 > allocated[i])
+ {
+ free (string[i]);
+ allocated[i] = number[i] + 1;
+ string[i] = xmalloc ((size_t) allocated[i]);
+ }
+
+ if (number[i] > 0)
+ if (!fread (string[i], (size_t) number[i], 1, file))
+ M4ERROR ((EXIT_FAILURE, 0, "Premature end of frozen file"));
+
+ string[i][number[i]] = '\0';
+ }
+ GET_CHARACTER;
+ VALIDATE ('\n');
+
+ /* Enter a macro having an expansion text as a definition. */
+ {
+ token_data *defs =
+ (token_data *) xmalloc ((arg_count - 1) * sizeof (token_data));
+
+ for (i = 1; i < arg_count; i++)
+ TOKEN_DATA_TEXT (&(defs[i-1])) = xstrdup (string[i]);
+ define_user_macro (string[0], arg_count - 1, defs, SYMBOL_PUSHDEF);
+ }
+ break;
+
                case 'C':
                case 'D':
                case 'F':
                case 'Q':
     operation = character;
***************
*** 324,334 ****
    `%s' from frozen file not found in builtin table!",
     string[0]));
- break;
-
- case 'T':
-
- /* Enter a macro having an expansion text as a definition. */
-
- define_user_macro (string[0], string[1], SYMBOL_PUSHDEF);
     break;


--- 393,396 ----
diff -rc2 ./src/input.c ./src/input.c
*** ./src/input.c Mon Aug 29 23:10:52 1994
--- ./src/input.c Thu May 11 10:44:12 2000
***************
*** 73,76 ****
--- 73,78 ----
     {
     char *string; /* string value */
+ symbol *sym;
+ unsigned defnum;
     }
                u_s;
***************
*** 104,107 ****
--- 106,115 ----
    int current_line;


+ /* Current macro being expanded. */
+ symbol *current_macro;
+
+ /* Current macro alternative definition selected. */
+ unsigned current_defnum;
+
    /* Obstack for storing individual tokens. */
    static struct obstack token_stack;
***************
*** 236,242 ****
--- 244,306 ----
     sizeof (struct input_block));
        next->type = INPUT_STRING;
+ next->u.u_s.sym = NULL;
        return current_input;
    }


+ /* Depreciate a specified alternative definition for a specified macro,
+ thus making it slightly less likely that the same alternative will be
+ selected again (as the operative expansion of the same macro) when and
+ if the same macro is called again while we are still processing and
+ rescanning the expansion text for the current call of the relevant
+ macro.
+
+ We effectively depreciate the specified alternative definition of the
+ specified macro by adding more weight to all of the _other_ alternative
+ definitions of the same macro.
+
+ For more information about the depreciation and appreciation of macro
+ definitions, see the comments above the `random_choice' function in
+ the `builtin.c' file. */
+
+ static void
+ depreciate (symbol *sym, unsigned i)
+ {
+ register unsigned ndefs = SYMBOL_NDEFS (sym);
+ register unsigned j;
+
+ if (ndefs == 0)
+ abort (); /* Should never happen. */
+ if (ndefs == 1)
+ return; /* Shortcut. */
+ for (j = 0; j < ndefs; j++)
+ if (j != i)
+ SYMBOL_WEIGHT (sym)[j]++;
+ }
+
+ /* Undo the depreciation of the specified alternative definion of the speci-
+ fied macro. The `appreciate' function performs the exact opposite of
+ what is done by the `depreciate' function above, i.e. we (re-)appreciate
+ the specified alternative definition for the specified macro by reducing
+ the weight of all of the _other_ alternative definitions of the same macro.
+
+ For more information about the depreciation and appreciation of macro
+ definitions, see the comments above the `random_choice' function in
+ the `builtin.c' file. */
+
+ static void
+ appreciate (symbol *sym, unsigned i)
+ {
+ register unsigned ndefs = SYMBOL_NDEFS (sym);
+ register unsigned j;
+
+ if (ndefs == 0)
+ abort (); /* Should never happen. */
+ if (ndefs == 1)
+ return; /* Shortcut. */
+ for (j = 0; j < ndefs; j++)
+ if (j != i)
+ SYMBOL_WEIGHT (sym)[j]--;
+ }
+
    /*------------------------------------------------------------------------.
    | Last half of push_string (). If next is now NULL, a call to push_file |
***************
*** 260,263 ****
--- 324,331 ----
                obstack_1grow (current_input, '\0');
                next->u.u_s.string = obstack_finish (current_input);
+ next->u.u_s.sym = current_macro;
+ next->u.u_s.defnum = current_defnum;
+ depreciate (current_macro, current_defnum);
+ current_macro = NULL;
                next->prev = isp;
                isp = next;
***************
*** 304,307 ****
--- 372,379 ----
            {
            case INPUT_STRING:
+ if (isp->u.u_s.sym != NULL)
+ appreciate (isp->u.u_s.sym, isp->u.u_s.defnum);
+ break;
+
            case INPUT_MACRO:
                break;
diff -rc2 ./src/m4.c ./src/m4.c
*** ./src/m4.c Tue Nov 1 19:14:28 1994
--- ./src/m4.c Wed May 10 13:05:05 2000
***************
*** 418,422 ****
     else
     *macro_value++ = '\0';
! define_user_macro (defines->macro, macro_value, SYMBOL_INSERT);
     break;


--- 418,428 ----
     else
     *macro_value++ = '\0';
! {
! token_data *p = (token_data *) xmalloc (sizeof (token_data));
!
! TOKEN_DATA_TYPE (p) = TOKEN_TEXT;
! TOKEN_DATA_TEXT (p) = xstrdup (macro_value);
! define_user_macro (defines->macro, 1, p, SYMBOL_INSERT);
! }
     break;


diff -rc2 ./src/m4.h ./src/m4.h
*** ./src/m4.h Sun Oct 30 23:15:50 1994
--- ./src/m4.h Thu May 11 09:48:37 2000
***************
*** 365,368 ****
--- 365,370 ----
    };


+ enum { initial_weight = 2 };
+
    /* Symbol table entry. */
    struct symbol
***************
*** 375,381 ****


        char *name;
! token_data data;
    };


    #define SYMBOL_NEXT(S) ((S)->next)
    #define SYMBOL_TRACED(S) ((S)->traced)
--- 377,390 ----


        char *name;
! unsigned ndefs;
! unsigned *weight;
! token_data *data;
    };


+ /* Note that we allow for there to be multiple definitions in cases where
+ the symbol is defined to multiple things having type TOKEN_TEXT, but if
+ the type of the definition is either TOKEN_VOID or TOKEN_FUNC, then only
+ a single definition is allowed. */
+
    #define SYMBOL_NEXT(S) ((S)->next)
    #define SYMBOL_TRACED(S) ((S)->traced)
***************
*** 384,390 ****
    #define SYMBOL_BLIND_NO_ARGS(S) ((S)->blind_no_args)
    #define SYMBOL_NAME(S) ((S)->name)
! #define SYMBOL_TYPE(S) (TOKEN_DATA_TYPE (&(S)->data))
! #define SYMBOL_TEXT(S) (TOKEN_DATA_TEXT (&(S)->data))
! #define SYMBOL_FUNC(S) (TOKEN_DATA_FUNC (&(S)->data))


    typedef enum symbol_lookup symbol_lookup;
--- 393,402 ----
    #define SYMBOL_BLIND_NO_ARGS(S) ((S)->blind_no_args)
    #define SYMBOL_NAME(S) ((S)->name)
! #define SYMBOL_NDEFS(S) ((S)->ndefs)
! #define SYMBOL_DEFS(S) ((S)->data)
! #define SYMBOL_WEIGHT(S) ((S)->weight)
! #define SYMBOL_TYPE(S) (TOKEN_DATA_TYPE (&(S)->data[0]))
! #define SYMBOL_FUNC(S) (TOKEN_DATA_FUNC (&(S)->data[0]))
! #define SYMBOL_TEXT(S,N) (TOKEN_DATA_TEXT (&(S)->data[N]))


    typedef enum symbol_lookup symbol_lookup;
***************
*** 428,432 ****
    void builtin_init _((void));
    void define_builtin _((const char *, const builtin *, symbol_lookup, boolean));
! void define_user_macro _((const char *, const char *, symbol_lookup));
    void undivert_all _((void));
    void expand_user_macro _((struct obstack *, symbol *, int, token_data **));
--- 440,444 ----
    void builtin_init _((void));
    void define_builtin _((const char *, const builtin *, symbol_lookup, boolean));
! void define_user_macro _((const char *, unsigned, token_data *, symbol_lookup));
    void undivert_all _((void));
    void expand_user_macro _((struct obstack *, symbol *, int, token_data **));
diff -rc2 ./src/symtab.c ./src/symtab.c
*** ./src/symtab.c Tue Nov 1 19:11:34 1994
--- ./src/symtab.c Thu May 11 00:09:10 2000
***************
*** 83,87 ****
            xfree (SYMBOL_NAME (sym));
        if (SYMBOL_TYPE (sym) == TOKEN_TEXT)
! xfree (SYMBOL_TEXT (sym));
        xfree ((voidstar) sym);
    }
--- 83,95 ----
            xfree (SYMBOL_NAME (sym));
        if (SYMBOL_TYPE (sym) == TOKEN_TEXT)
! {
! unsigned ndefs = SYMBOL_NDEFS (sym);
! unsigned i;
!
! for (i = 0; i < ndefs; i++)
! xfree (SYMBOL_TEXT (sym, i));
! xfree (SYMBOL_DEFS (sym));
! xfree (SYMBOL_WEIGHT (sym));
! }
        xfree ((voidstar) sym);
    }
***************
*** 145,148 ****
--- 153,160 ----


                sym = (symbol *) xmalloc (sizeof (symbol));
+ SYMBOL_NDEFS (sym) = 1;
+ SYMBOL_DEFS (sym) = (token_data *) xmalloc (sizeof (token_data));
+ SYMBOL_WEIGHT (sym) = (unsigned *) xmalloc (sizeof (unsigned));
+ SYMBOL_WEIGHT (sym)[0] = initial_weight;
                SYMBOL_TYPE (sym) = TOKEN_VOID;
                SYMBOL_TRACED (sym) = SYMBOL_SHADOWED (sym) = FALSE;





Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.