The UPS C Interpreter

Dibyendu Majumdar <>
30 Jul 1999 22:44:59 -0400

          From comp.compilers

Related articles
The UPS C Interpreter (Dibyendu Majumdar) (1999-07-30)
| List of all articles for this month |

From: Dibyendu Majumdar <>
Newsgroups: comp.compilers
Date: 30 Jul 1999 22:44:59 -0400
Organization: Compilers Central
Keywords: C, interpreter

The UPS C Debugger includes a reasonably complete ISO C Interpreter.
The C Interpreter is completely independent of the UPS C Debugger, and
can be built and used on its own right. The standard UPS distribution
does not document the interpreter - however it does describe the
mechanism used to "link" interpreter object files into interpreter
"executables". This information is in the

ups/doc/ document.

Information about UPS is available from .

This document describes the C Interpreter in ups-3.34-beta4. The C
Interpreter in ups-3.33 does not work in stand-alone mode.

Interpreter usage
The C interpreter comes in two flavours.

1. To compile and interpret a single .c file, you can use 'cg'. 'cg' is
      as follows:

      cg [-I<dir>] [-oldc] [-D<sym>[=<value>]] [-U<sym] <source file>
                <arg1> <arg2> ...

      The -I, -D and -U flags have the same meaning as in most C compilers.

      The -oldc flag is to suppress a few warning messages related to the
      of pre-ANSI C features.

      Note that the C compiler must be in the PATH when executing 'cg'.
      is because 'cg' calls the compiler with the -E flag to preprocess the

      source file. The -I, -D, and -U flags are passed to the compiler and

      ignored by 'cg' itself.

      By default the C compiler used as a preprocessor is the one that
was used
      to build UPS. See the build instructions if you want to change this.

2. The UPS C interpreter can also interpret applications built from
      source files. This is achieved as follows:

      First, each source is compiled into bytecode using 'cx'. 'cx'
      .ocx files - which are similar in purpose to the .o files produced by

      After all sources have been compiled, the .ocx files are linked to
      the executable. The executable does not contain the .ocx files -
rather it
      contains references to them. This allows it to be very compact.

      Most of the real code continues to reside in the .ocx files, and is
      at run-time on demand.

      Once the executable has been built it can be executed by the 'xc'
      command. Alternatively, if 'xc' is placed in the appropriate
      the executable can be run directly from the UNIX prompt using the
      '#!' facility.

      The syntax for 'cx' is as follows:

      cx [-o executable] [-c] [-g] [-asm] [-S] [-oldc] [-D<sym>[=<value>]]
                [-U<sym>] [-I<dir>] <source files ...>

      The various options are sumarized below:

                -o The argument names the executable. Default is cx.out.
                -c Means compile only.
                -g Include debugging information. Allows interpreter
                                to be debugged using UPS.
                -S Produce assembler source (with original source
                                Requires -g.
                -asm Produce assembler source only.
                -oldc Suppress warnings related to use of pre-ANSI features.
                -I Look for header files included with <> in <dir>.
                                More than one of this option may be supplied.
                -D Define preprocessor symbol.
                -U Undefine preprocessor symbol.

      Note that the GNU C compiler must be in the PATH when executing
'cg'. This
      is because 'cg' calls 'gcc' with the -E flag to preprocess the source
      The -I, -D, and -U flags are passed to 'gcc' and ignored by 'cg'

      Note that the assembler source shows the assembly language used by
      interpreter - it cannot be processed further in any way - and is
      only for understanding/debugging the interpreter.

Both cg and cx recognise the following environment variables:

                LEX_DEBUG - if set causes the lexical analyzer to dump
                                                    debugging messages.

3. To execute an interpreter binary (created by 'cx'), use 'xc' as

      xc [-s<n>] <binary> [<args ...>]

      The -s option allows you to specify a stack size in bytes. The
      stack size is 40000 bytes.

The Language
The UPS C Interpreter implements a good subset of the ISO C 90 language.

It gets some things wrong (see PROBLEMS for a list of these), and does
implement some features of ISO C, such as:

                * Multi-byte characters/strings
                * Wide characters/strings
                * Trigraphs
                * Initialization of local aggregates

Support for ISO C library
The Interpreter has in-built support for many ISO C library functions.
The following headers are not supported:


Apart from above, the following C library functions/macros/typedefs are


The header files provided can be used to obtain a subset of ISO C
on Linux. The headers may need customization on other platforms.

Note that the support for header files and functions is operating system


The UPS C Interpreter has a number of problems but none of these are
show-stoppers (in my opinion). As far as I know, the interpreter is
ISO C compliant in other respects (please let me know if you discover

1. Non-enum values cannot be assigned to enum variables without a

      enum bool { false, true };
      typedef enum bool bool;

      int main(void)
              bool boolean = 5; /* fails */
              bool boolean = (bool)5; /* ok */
              return 0;

2. The increment (++) or decrement (--) operators do not work with
      floating point values.

3. Automatic arrays and structures cannot be initialised in functions.
      example, this doesn't work:

      int func(void)
            int array[] = { 1, 2 };

      However following works fine:
      int func(void)
            static int array[] = { 1, 2 };

4. ISO C rules for identifier scopes are:

      1. Structure, union, and enumeration tags have scope that begins just
      the appearance of the tag in a type specifier that declares the tag.
      2. Each enumeration constant has scope that begins just after the
      appearance of its defining enumerator in an enumerator list.
      3. Any other identifier has scope that begins just after the
      completion of its declarator.

      The UPS C Interpreter follows somewhat different rules.

      1. Structure and union tags have scope that begins just after the
      appearance of the tag in a type specifier that declares the
      tag. Enumeration tags come into scope at the end of the declaration
      or at the first reference.
      2. Enumeration constants come into scope after the complete
declaration of
      the enumeration type.
      3. Variables come into scope after their declaration terminates (when
      ';' that ends the declaration is encountered).

      Following valid ISO C constructs fail in UPS.

      char * words[] = {
                /* whatever */
      }, **wordlist = words; /* words undefined */
      int intsize = sizeof(intsize) /* intsize undefined */
      int a = 0, b = a; /* a undefined */
      enum { false, true=false+1 }; /* false undefined */

5. Constant expressions do not allow the following constructs:

      * address of (&)
      * variable reference
      * floating point constants in conditional expressions

      int x[][5] = { {1, 2, 3, 4, 0}, { 5, 6 }, { 7 } };
      int *y[] = { x[0], x[1], x[2], 0 }; /* fails */
      extern void func(void);
      void (*fptr)(void) = &func; /* fails */
      int i = 0.25 ? 1 : 0; /* = 0 */

6. A function declared/defined without a parameter list and not
      specified void, causes CX to complain about missing prototype.
      This can be suppressed with the -oldc option.

7. All floating point constants are of type 'double'. The suffixes 'F'
      'L' are ignored.

8. The interpreter assumes that 'int', 'long', 'unsigned', 'unsigned
      and 'void *' are the same size (4 bytes, 32 bits).
      It also assumes that 'unsigned long' and 'void *' can be assigned to
      each other without loss of information.

9. 'long double' is a synonym for 'double' and not a distinct type.

10. 'signed char' is a synonym for 'char' and not a distinct type.

11. Casts are NOOPs under following situations:

        a) In constant expressions.

        b) Also in non-constant expressions where a 32 bits integer is being
              to a narrower (16 or 8 bits) type.

12. Non-interpreted functions (such as built-ins) which return aggregate

        types (or true long doubles) cannot be called from interpreted code.

        Thus the Standard C functons div() and ldiv() cannot be called from
        interpreted code.

13. When strings are assigned to arrays - the length of the string
        the '\0' byte) must not exceed the array size. This is
        different from ISO C where extra characters are discarded and
        the array is not null terminated if the string literal is longer
        than will fit.

14. A structure declared in the parameter list of a function creates the

        structure type in translation unit scope - unlike ANSI C where the
        scope of the structure is restricted to the function.

15. Structures resulting from expressions are not l-values. You cannot

        if (func().member == value) /**/ ;

        where func() returns a structure by value, and member is a member of
        structure returned.

        Causes fatal error (sp botch in ce).

Further Information

Further information regarding the UPS C interpreter can be obtained
from the ups-3.34-beta4 distribution.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.