New compiler construction tool set

Vladimir Makarov <vmakarov@yesic.com>
8 Jan 1998 00:50:25 -0500

From comp.compilers

Related articles
*New compiler construction tool set vmakarov@yesic.com (Vladimir Makarov)* (1998-01-08)**

| List of all articles for this month |

From:	Vladimir Makarov <vmakarov@yesic.com>
Newsgroups:	comp.compilers
Date:	8 Jan 1998 00:50:25 -0500
Organization:	Compilers Central
Keywords:	tools, available

Hello,

I am Vladimir Makarov. At home I use mainly only free software:
Linux, Gnu Software and etc. Of course I am very gratefull to all
creators of free software. I also would like to help them in my turn.

Because I work practically in one field (compilers) during 15 years, I
created tools for myself in order to make easy implementation of
different language processors and compilers. I named this tool set as
COCOM. Another name of the toolset (Russian Armoury) is given for
simplicity of WEB searching. This toolset is distributed under GNU
public license. You can download the tool set from

http://www.geocities.com/SiliconValley/Lab/4557/download.html

Brief description of the tool set is below

-----------------------------------------------------------------------

COCOM tool set (Russian Armoury)
Vladimir Makarov, vmakarov@usa.net
17 Dec 1997

This document describes COCOM tool set oriented towards the creation
of compilers, cross-compilers, interpreters, and other language
processors.

COCOM tool set is oriented towards the creation of compilers,
cross-compilers, interpreters, and other language processors. Now
COCOM tool set consists of the following components:

        o Ammunition (reusable packages)
        o Sprut (internal representation description translator)
        o Nona (code selector description translator)
        o Msta (syntax description translator)
        o Oka (pipeline hazards description translator)
        o Shilka (keywords description translator)

All of these components are written in ANSI C and have common style
input langauges (a la YACC). All code generated by the components is
in also strict ANSI C and in standard C++. All documentation exists
in ASCII, TeX dvi, Postrcipt, HTML, GNU info, and RTF formats.

If you are interested what the strange names (sprut, msta, and so on)
mean, go to <url url="./armoury.html" name="./armoury.html">.

1. Ammunition (reusable packages)

Currently there are the following packages:

      o allocate
                Allocating and freeing memory with automatic fixing some
                allocation errors.
      o vlobject
                Work with variable length objects (VLO). Any number of bytes
                may be added to and removed from the end of VLO. If it is
                needed the memory allocated for storing variable length object
                may be expanded possibly with changing the object place. But
                between any additions of the bytes (or tailoring) the object
                place is not changed. To decrease number of changes of the
                object place the memory being allocated for the object is
                longer than the current object length.
      o objstack
                Work with stacks of objects (OS). Work with the object on the
                stack top is analogous to one with a variable length object.
                One motivation for the package is the problem of growing char
                strings in symbol tables. Memory for OS is allocated by
                segments. A segment may contain more one objects. The most
                recently allocated segment contains object on the top of OS.
                If there is not sufficient free memory for the top object than
                new segment is created and the top object is transferred into
                the new segment, i.e. there is not any memory reallocation.
                Therefore the top object may change its address. But other
                objects never change address.
      o hashtab
                Work with hash tables. The package permits to work
                simultaneously with several expandable hash tables. Besides
                insertion and search of elements the elements from the hash
                tables can be also removed. The table element can be only a
                pointer. The size of hash tables is not fixed. The hash
                table will be automatically expanded when its occupancy will
                became big.
      o position
                Work with source code positions. The package serves to
                support information about source positions of compiled files
                taking all included files into account.
      o errors
                Output of compiler messages. The package serves output
                one-pass or multi-pass compiler messages of various modes
                (errors, warnings, fatal, system errors and appended messages)
                in Unix style or for traditional listing. The package also
                permits adequate error reporting for included files.
      o commline
                Work with command line. The package implements features
                analogous to ones of public domain function `getopt'. The
                goal of the package creation is to use more readable language
                of command line description and to use command line
                description as help output of program.
      o ticker
                Simultaneous work with several tickers (timers).
      o bits
                Work with bit strings (copying, moving, setting, testing,
                comparison).
      o arithm
                Implementing host machine-independently arbitrary precision
                integer numbers arithmetic. The implementation of the package
                functions are not sufficiently efficient in order to use for
                run-time. The package functions are oriented to implement
                constant-folding in compilers, cross-compilers.
      o IEEE
                Implementing host machine-independently IEEE floating point
                arithmetic. The implementation of the package functions are
                not sufficiently efficient in order to use for run-time. The
                package functions are oriented to implement constant-folding
                in compilers, cross-compilers.

Current state: Implemented, documented, and tested. All these
packages have been used in several products.

Under development: Design of some reusable packages for optimizing
compilers.

2. SPRUT (internal representation description tranlator)

SPRUT is a translator of a compiler internal representation
description (IRD) into Standard Procedural Interface (SPI). The most
convenient form of the internal representation is a directed graph.
IRD defines structure of the graph. SPI provides general graph
manipulating functions. The defined graph nodes can be decorated with
attributes of arbitrary types.

IRD declares types of nodes of the graph. Nodes contains fields, part
of them represents links between nodes, and another part of them
stores attributes of arbitrary types. To make easy describing
internal representation the IRD supports explicitly multiple
inheritance in node types. There can be several levels of internal
representation description in separate files. The nodes of one level
refer to the nodes of previous levels. Therefore each next level
enriches source program internal representation. For example, the
zero level representation may be internal representation for scanner,
the first level may be internal representation for parser, and so on.

SPI can contains functions to construct and destroy graphs and graph
nodes, to copy graphs or graph nodes, to read and write graphs or
graph nodes from (to) files, to print graphs or graph nodes, to check
up constraints on graph, to traverse graphs, and to transform acyclic
graphs in some commonly used manners. SPI can also check up the most
important constraints on internal representation during work with node
fields. SPI can automatically maintain back links between internal
representation nodes.

Using SPRUT has the following advantages:

          1. brief and concise notation for internal representation
          2. improving maintainability and as consequence reliability of
                the compiler
          3. user is freed from the task of writing large amounts of
                relatively simple code

Current state: Implemented, documented, and tested. SPRUT has been
used in several products (the biggest one is extended Pascal
cross-compiler with moderate optimizations and with 3 different
internal representations).

Under development: Error recovery.

3. NONA (code selector description translator)

NONA is a translator of a machine description (MD) into code for
solving code selection and possibly other back-end tasks. The machine
description is mainly intended for describing code selection task
solution, i.e. for determining by machine-independent way a
transformation of a low level internal representation of source
program into machine instruction level internal representation. But
the machine description can be used also to locate machine dependent
code for solving other back-end task, e.g. register allocation. To
describe machine description a special language is used.

An machine description describes mainly tree patterns of low level
internal representation with associated costs and semantic actions.
NONA generates the tree matcher which builds cover of low level
internal representation by the tree patterns with minimal cost on the
first bottom up pass and fulfills actions associated with the choiced
tree patterns on the second bottom up pass. Usually the actions
contain code to output assembler instruction.

Analogous approach for solving code selection task is used by modern
generator generators such as BEG, Twig, Burg and Iburg. The tree
matcher generated by NONA uses algorithm similar to one of BEG and
Iburg, i.e. the algorithm is based on dynamic programming during
fulfilling code selection.

Although the algorithm used by BURG and based on dynamic programming
during tree pattern matcher generation time is considerably more fast,
it is not acceptable for us. Its main drawback which is to need usage
of less powerful machine description results in necessity of usage of
more machine-dependent low level internal representation. For
example, the special internal representation node types for 8-bits,
16-bits constants besides 32-bits constants would be needed. Also the
algorithm used by BURG is considerably more complex.

Tree pattern matchers generated by NONA also can work with directed
acyclic graphs besides trees. This feature is useful when target
machine instruction is generated from the internal representation
which is result of some optimizations such as common sub-expression
elimination.

Current state: Implemented, documented (only plain text), and tested.
NONA has been used in several products (the biggest is extended Pascal
cross-compiler for superscalar RISC processor `AMD 29500' with
moderate optimizations).

Under development: Additional generation of the tree pattern matcher
based on dynamic programming during generation of the tree pattern
matcher. Pascal implementation experience shows that time of the tree
pattern matcher work is practically the same as the time of all
front-end work.

4. MSTA (syntax description translator)

The MSTA can emulate YACC (Posix standard). The MSTA have the
following additional features:

        o Fast LR(k) and LALR(k) grammars (with possibility resolution of
            conflicts). Look ahead of only necessary depth (not necessary
            given k). Originally LALR(k) parsers are generated by modified
            fast DeRemer's algorithm.

        o Extended Backus-Naur Form (EBNF), and constructions for more
            convinient description of the scanners. More convinient
            naming attributes.

        o Optimizations (extracting LALR- and regular parts of grammars and
            implementing parsing them by adequate methods) which permit to
            use MSTA for generation of effective lexical analyzers. As
            consequence MSTA permits to describe easyly (by CFG) scanners
            which can not be described by regular expressions (i.e. nested
            comments).

        o More safe error recovery and reporting (besides error
            recovery method of YACC).

        o Fast generation of fast parsers.

Current state: C++ interface has not implemented yet. Experimental
status. Better documentation is needed.

MSTA uses several methods (parser optimizations) nowhere described.

5. OKA (pipeline hazards desrciption translator)

OKA is a translator of a processor pipeline hazards description (PHD)
into code for fast recognition of pipeline hazards. A pipeline
hazards description describes mainly reservations of processor
functional units by an instruction during its execution. The
instruction reservations are given by regular expression describing
nondeterministic finite state automaton (NDFA). All analogous tools
are based only on deterministic finite state automaton (DFA).

OKA is accompanied with the scheduler on C and C++ for scheduling
basic blocks.

Current state: Implemented, documented, and tested. OKA has been used
in experimental C/C++ compiler for Alpha.

6. SHILKA (keywords description translator)

SHILKA is oriented to fast recognition of keywords and standard
identifiers in compilers. SHILKA is analogous to GNU package `gperf'
but based on minimal pruned O-trie which can take into accout the
frequency of keyword occurences in the program. Gperf can not make
it. SHILKA is up to 20% faster than Gperf. SHILKA is also simpler
than Gperf in the usage.

Current state: Implemented, documented, and tested.

------------------------------------------------------------------------------

Vladimir N. Makarov <vmakarov@usa.net>
http://www.geocities.com/SiliconValley/Lab/4557
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

New compiler construction tool set

Vladimir Makarov <vmakarov@yesic.com>8 Jan 1998 00:50:25 -0500

Vladimir Makarov <vmakarov@yesic.com>
8 Jan 1998 00:50:25 -0500