The UPS C Interpreter

Dibyendu Majumdar <dibyendu@mazumdar.demon.co.uk>
30 Jul 1999 22:44:59 -0400

          From comp.compilers

Related articles
The UPS C Interpreter dibyendu@mazumdar.demon.co.uk (Dibyendu Majumdar) (1999-07-30)
| List of all articles for this month |
From: Dibyendu Majumdar <dibyendu@mazumdar.demon.co.uk>
Newsgroups: comp.compilers
Date: 30 Jul 1999 22:44:59 -0400
Organization: Compilers Central
Keywords: C, interpreter

Introduction
------------
The UPS C Debugger includes a reasonably complete ISO C Interpreter.
The C Interpreter is completely independent of the UPS C Debugger, and
can be built and used on its own right. The standard UPS distribution
does not document the interpreter - however it does describe the
mechanism used to "link" interpreter object files into interpreter
"executables". This information is in the


ups/doc/linking.ms document.


Information about UPS is available from
http://www.concerto.demon.co.uk/UPS .


This document describes the C Interpreter in ups-3.34-beta4. The C
Interpreter in ups-3.33 does not work in stand-alone mode.


Interpreter usage
-----------------
The C interpreter comes in two flavours.


1. To compile and interpret a single .c file, you can use 'cg'. 'cg' is
used
      as follows:


      cg [-I<dir>] [-oldc] [-D<sym>[=<value>]] [-U<sym] <source file>
                <arg1> <arg2> ...


      The -I, -D and -U flags have the same meaning as in most C compilers.


      The -oldc flag is to suppress a few warning messages related to the
use
      of pre-ANSI C features.


      Note that the C compiler must be in the PATH when executing 'cg'.
This
      is because 'cg' calls the compiler with the -E flag to preprocess the


      source file. The -I, -D, and -U flags are passed to the compiler and


      ignored by 'cg' itself.


      By default the C compiler used as a preprocessor is the one that
was used
      to build UPS. See the build instructions if you want to change this.


2. The UPS C interpreter can also interpret applications built from
multiple
      source files. This is achieved as follows:


      First, each source is compiled into bytecode using 'cx'. 'cx'
produces
      .ocx files - which are similar in purpose to the .o files produced by
'gcc'.


      After all sources have been compiled, the .ocx files are linked to
create
      the executable. The executable does not contain the .ocx files -
rather it
      contains references to them. This allows it to be very compact.


      Most of the real code continues to reside in the .ocx files, and is
loaded
      at run-time on demand.


      Once the executable has been built it can be executed by the 'xc'
      command. Alternatively, if 'xc' is placed in the appropriate
directory,
      the executable can be run directly from the UNIX prompt using the
shell
      '#!' facility.


      The syntax for 'cx' is as follows:


      cx [-o executable] [-c] [-g] [-asm] [-S] [-oldc] [-D<sym>[=<value>]]
                [-U<sym>] [-I<dir>] <source files ...>


      The various options are sumarized below:


                -o The argument names the executable. Default is cx.out.
                -c Means compile only.
                -g Include debugging information. Allows interpreter
executables
                                to be debugged using UPS.
                -S Produce assembler source (with original source
indicated).
                                Requires -g.
                -asm Produce assembler source only.
                -oldc Suppress warnings related to use of pre-ANSI features.
                -I Look for header files included with <> in <dir>.
                                More than one of this option may be supplied.
                -D Define preprocessor symbol.
                -U Undefine preprocessor symbol.


      Note that the GNU C compiler must be in the PATH when executing
'cg'. This
      is because 'cg' calls 'gcc' with the -E flag to preprocess the source
file.
      The -I, -D, and -U flags are passed to 'gcc' and ignored by 'cg'
itself.


      Note that the assembler source shows the assembly language used by
the
      interpreter - it cannot be processed further in any way - and is
useful
      only for understanding/debugging the interpreter.




Both cg and cx recognise the following environment variables:


                LEX_DEBUG - if set causes the lexical analyzer to dump
                                                    debugging messages.


3. To execute an interpreter binary (created by 'cx'), use 'xc' as
      follows:


      xc [-s<n>] <binary> [<args ...>]


      The -s option allows you to specify a stack size in bytes. The
default
      stack size is 40000 bytes.


The Language
------------
The UPS C Interpreter implements a good subset of the ISO C 90 language.


It gets some things wrong (see PROBLEMS for a list of these), and does
not
implement some features of ISO C, such as:


                * Multi-byte characters/strings
                * Wide characters/strings
                * Trigraphs
                * Initialization of local aggregates


Support for ISO C library
-------------------------
The Interpreter has in-built support for many ISO C library functions.
The following headers are not supported:


                wchar.h
                setjmp.h
                locale.h
                wctype.h
                iso646.h


Apart from above, the following C library functions/macros/typedefs are
not
supported:


                gets()
                tmpfile()
                strcoll()
                strxfrm()
                atexit()
                div()
                ldiv()
                mblen()
                mbtowc()
                mbstowcs()
                wcstombs()
                MB_CUR_MAX
                wchar_t


The header files provided can be used to obtain a subset of ISO C
environment
on Linux. The headers may need customization on other platforms.


Note that the support for header files and functions is operating system


dependent.


PROBLEMS
--------
The UPS C Interpreter has a number of problems but none of these are
show-stoppers (in my opinion). As far as I know, the interpreter is
ISO C compliant in other respects (please let me know if you discover
otherwise).


1. Non-enum values cannot be assigned to enum variables without a
      cast.


      enum bool { false, true };
      typedef enum bool bool;


      int main(void)
      {
              bool boolean = 5; /* fails */
              bool boolean = (bool)5; /* ok */
              return 0;
      }


2. The increment (++) or decrement (--) operators do not work with
      floating point values.


3. Automatic arrays and structures cannot be initialised in functions.
For
      example, this doesn't work:


      int func(void)
      {
            int array[] = { 1, 2 };
      }


      However following works fine:
      int func(void)
      {
            static int array[] = { 1, 2 };
      }


4. ISO C rules for identifier scopes are:


      1. Structure, union, and enumeration tags have scope that begins just
after
      the appearance of the tag in a type specifier that declares the tag.
      2. Each enumeration constant has scope that begins just after the
      appearance of its defining enumerator in an enumerator list.
      3. Any other identifier has scope that begins just after the
      completion of its declarator.


      The UPS C Interpreter follows somewhat different rules.


      1. Structure and union tags have scope that begins just after the
      appearance of the tag in a type specifier that declares the
      tag. Enumeration tags come into scope at the end of the declaration
      or at the first reference.
      2. Enumeration constants come into scope after the complete
declaration of
      the enumeration type.
      3. Variables come into scope after their declaration terminates (when
the
      ';' that ends the declaration is encountered).


      Following valid ISO C constructs fail in UPS.


      char * words[] = {
                /* whatever */
      }, **wordlist = words; /* words undefined */
      int intsize = sizeof(intsize) /* intsize undefined */
      int a = 0, b = a; /* a undefined */
      enum { false, true=false+1 }; /* false undefined */


5. Constant expressions do not allow the following constructs:


      * address of (&)
      * variable reference
      * floating point constants in conditional expressions


      Example:
      int x[][5] = { {1, 2, 3, 4, 0}, { 5, 6 }, { 7 } };
      int *y[] = { x[0], x[1], x[2], 0 }; /* fails */
      extern void func(void);
      void (*fptr)(void) = &func; /* fails */
      int i = 0.25 ? 1 : 0; /* = 0 */


6. A function declared/defined without a parameter list and not
      specified void, causes CX to complain about missing prototype.
      This can be suppressed with the -oldc option.


7. All floating point constants are of type 'double'. The suffixes 'F'
or
      'L' are ignored.


8. The interpreter assumes that 'int', 'long', 'unsigned', 'unsigned
long'
      and 'void *' are the same size (4 bytes, 32 bits).
      It also assumes that 'unsigned long' and 'void *' can be assigned to
      each other without loss of information.


9. 'long double' is a synonym for 'double' and not a distinct type.


10. 'signed char' is a synonym for 'char' and not a distinct type.


11. Casts are NOOPs under following situations:


        a) In constant expressions.


        b) Also in non-constant expressions where a 32 bits integer is being
cast
              to a narrower (16 or 8 bits) type.


12. Non-interpreted functions (such as built-ins) which return aggregate


        types (or true long doubles) cannot be called from interpreted code.


        Thus the Standard C functons div() and ldiv() cannot be called from
        interpreted code.


13. When strings are assigned to arrays - the length of the string
(including
        the '\0' byte) must not exceed the array size. This is
        different from ISO C where extra characters are discarded and
        the array is not null terminated if the string literal is longer
        than will fit.


14. A structure declared in the parameter list of a function creates the


        structure type in translation unit scope - unlike ANSI C where the
        scope of the structure is restricted to the function.


15. Structures resulting from expressions are not l-values. You cannot
code:


        if (func().member == value) /**/ ;


        where func() returns a structure by value, and member is a member of
the
        structure returned.


        Causes fatal error (sp botch in ce).


Further Information
-----------------


Further information regarding the UPS C interpreter can be obtained
from the ups-3.34-beta4 distribution.


Regards


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.