Extensible compilers (debugging)

mtrofimov@glas.apc.org (Michael Trofimov)
30 Mar 1998 21:44:07 -0500

          From comp.compilers

Related articles
Extensible compilers (debugging) mtrofimov@glas.apc.org (1998-03-30)
| List of all articles for this month |

From: mtrofimov@glas.apc.org (Michael Trofimov)
Newsgroups: comp.compilers
Date: 30 Mar 1998 21:44:07 -0500
Organization: Compilers Central
Keywords: debug

Extensible compilers (debugging):

Our moderator wrote:

> [The debugging nightmare I was worried about was the one where you
> can add your own optimizing routines to the compiler, so bugs in
> those routines can cause broken code anywhere in the resulting
> program. -John]

For the Universal Compiler Shell (UCS), I described before, to provide
easy debugging I use following :

1. Language for UCS implementation. I selected Pascal, because I think
it is more suitable for experiments of this sort than C/C++.

2. Computer platform for initial implementation. I selected Mac.
(But I plan to port UCS to IBM PC later.)

3. Environment: MPW and SADE source code debugger.

4. Programming paradigm: modular (non-OOP).

5. Dump. There is simple dump option to record parsing process.

6. There is no recursive calls in the parser, so it's easy to localize
a bug which was detected via dump file. Also it provides a possibility
to write well-structured UCS sources.

The following fragment is example of dump:

1: program prog;
2: #token="program" keyWordKd
3: ->synt: "program" programV #1 _in_ programV -- ok
4: INCLUDE 'PRunTime.a' ; genProgStartInstr
5: INCLUDE 'SegLoad.a' ; genProgStartInstr
6: #token="prog" identKd
7: ->synt: "prog" #251 _in_ programV -- ok
8: #token=";" otherKd
9: ->synt: ";" #256 _in_ programV -- ok

The line #1 is a string from input stream.
The line #2: a token "program" is recognized as keyword (ProgramSy value).
(programV is current subgraph, i.e. name of current syntax diagram)
The line #3: Syntax analyzer found node #1 in syntax graph, which
node has value ProgramSy.
The line #3,4: code gener's output.

If node #1 has wrong value (not ProgramSy) or the node is disconnected
with node #251 (line #7) user will see these bugs from the dump. If
a user adds any optimizing procedure (with bug) to the parser,
he/she will see wrong dump also. For example, let:

3: ->synt: "program" programV #1 _in_ programV
(" -- ok" is absent in this line);

and let node #1 has right value (user will check it via Syntax Editor).

This means the added procedure interprets it incorrectly.

Or, for example, let:

3: ->synt: "program" programV #251 _in_ programV

This means Syntax analyzer jumps over node #1.

I think, this dump is much more simple for understanding than
a trace of classical parser with complex recursive calls.


> bugs in those routines can cause broken code anywhere in the
> resulting program

the bug will be marked not only in line #3 in our example, but,
perhaps, in lines 7,9 as well.

7. UCS directives, i.e. commands that a user embeds directly to source
for parsing. For example, "~s"-directive forces UCS to call system
error trap with code 8005, that transfers control to SADE and user can
continue UCS execution step by step under this debugger. So, user is
able to set break points not only in UCS sources, but also in his test

8. Some special features. Fore example, UCS is capable to read its
source code -- module with global constants -- to load the list of
keywords, etc. So, to insert new keyword user should write it in one
place of UCS source and recompile it.

9. An idea to support 2 parsers (not implemented, yet): syntax node
is implemented as:

                Node = record
                                                nextIfNotFound : Link; {Link = ^Node;}
                                                keyword : symbols; {for
example, ProgramSy}
                                                action : integer;
                                                {............... other fields ..............}

Let's insert a new field into this record:

                oldParser : Boolean;

and insert the following code fragment to UCS :

                if CurrentNode.oldParser

So user would be able to write his own parser and modify only
that syntax nodes which he wants.

10. Very moderate source code size.
For current (development) version of UCS with Pascal syntax

Syntax Diagrams Editor 2184 lines
Syntax analyzer 855
Other modules 8008
Total 11047

(For comparison:

Small Pascal interpreter (Pascal-S) by N. Wirth -- about 4,000 lines;
old Pascal 8000/2 for IBM 360/370 -- about 10,000 lines.

-- And these are very small programs, other modern sources of Pascal
compilers are longer!)

The fact is that in practical work I did not find hard problems to detect
bugs in so small Syntax analyzer (855 lines only!) of UCS.

Also, note that many long fragments of the UCS sources have very
simple regular understandable structure, for example Pascal syntax
initialization procedure has about 1,000 lines, but mainly it consists
procedure's calls and assignments generated via the Editor:

programV := makeSymbVert ([programSy],0,nil,nil,ord
addShowName (programV,'programV');
addToSyntaxList (programV);
v1 := makeSymbVert ([idMeta],0,nil,programV,ord(newSubtreeAct),0,2,1,true);
programV^.nextIfFound := v1;

11. Well designed interface. I saved a lot of time in UCS interface
writing: I designed GUI resources under ResEdit and generated
Pascal sources via my MT2Trivial tool (if you are interested to get
more info about this tool, please, see <http://www.glasnet.ru/~mtrofimov/>).
Of course, GUI is not part of parser, but parser debugging under
poor application's interface would be a real nightmare...

Michael I. Trofimov
Russian Academy of Sci. Moscow, Russia
email: mtrofimov@glas.apc.org

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.