Related articles |
---|
newbie grammar/parser/interpreter questions craigh@cserver.plexus.com (1992-05-07) |
Re: newbie grammar/parser/interpreter questions parrt@ecn.purdue.edu (1992-05-08) |
experience with code transformation systems? unisoft@tigger.jvnc.net (1992-05-09) |
Newsgroups: | comp.compilers |
From: | craigh@cserver.plexus.com (Craig Heilman) |
Keywords: | parse, question |
Organization: | Compilers Central |
Date: | Thu, 7 May 1992 21:38:01 GMT |
Hi,
I'm in the process of writing a compiler (actually an interpreter) for a
psuedo-object-oriented language. I have been reading a book "Compiler
Design in C" by A. I. Holub and have learned a lot but I still have some
implementation-specific questions. My basic goal is to create a correct
grammar for my language and from there determine whether I will hard-code
the parser/interpreter or rely on a tool like yacc or bison to generate a
parser for me. I also have to figure out how to "generate code" that can
be executed right after a valid parse. This all happens internal to my
device which has no disk space, minimal RAM (although I do have dynamic
memory) and a slow processor (80x86 family). Implementation languages are
C & assembly.
CURRENT STATUS
I have a working lexical analyzer that currently can recognize all the
discrete constructs of my language. I also have a preliminary recursive-
descent parser that can handle most of the syntax of my language.
Unfortunately there are a few precedence rules I forgot about, thus my
parser is now breaking on valid language statements. My parser currently
builds an actual parse tree when it parses a language statement. I
thought I would be able to just traverse the tree and call some internal
routines, effectively executing the language statement. I thought the
order of traversal (preorder, inorder, postorder) and the structure of the
tree would handle the precedence rules but...
LANGUAGE BASICS
A typical expression looks like this:
target message argument
(ie. in C++ this would be: target.message(argument); )
Nested subexpressions for the target and/or argument are valid using
parenthesis (any level of nesting is allowed). In the example below, the
first subexpression must evaluate to a valid target (at execution time)
and the second subexpression must evaluate to the type of argument that
message2 is expecting (again at execution time).
(target1 message1 argument1) message2 (target3 message3 argument3)
Nested subexpressions for the target and/or argument are also valid using
the precedence associated with the message type (3 types of messages):
target1 message1 target?/arg? message2 argument2
if message2 has higher precedence than message1, the above
expression will be interpreted as if it were written like so:
target1 message1 (target2 message2 argument2)
if message1 has higher precedence than message2, the above
expression will be interpreted as if it were written like so:
(target1 message1 argument1) message2 argument2
Grammar Specification
A simplified grammar for my language is as follows. Please note that this
is my first attempt at creating a grammar so constructive criticism is
welcome.
Notes: Terminals are capitalized or single-quoted and an empty string is
represented by E. The end-of-input marker "|-" is implicit in the first
production. Rules lower in a production have higher precedence.
A. "Statements" consist of one or more "expression's" separated by periods.
statement -> expression
| expression '.' statement
B. There are two types of expressions:
expression -> msg_expression
| block
msg_expression -> object_spec msg
block -> '[' statement ']'
Comparing the msg_expression to the typical language expression
described above, an object_spec corresponds to a target and a msg
corresponds to the message and argument combined. A block is just
a delimited statement that may be treated as a discrete object (ie.
one can send messages to a block like "execute yourself").
C. There can be several msg's following an object_spec, each separated
by semicolons ';'. An optional receiver may also be specified for
each msg_spec.
msg -> receiver msg_spec
| receiver msg_spec ';' msg
receiver -> RECEIVER_SUPERCLASS
| E
D. An object_spec (or "target") may have an optional unit_spec in front
of the actual object description (object_desc). We also take into
account the possibility of a parenthesized subexpression here. Within
a unit_spec, there may be several subunit_desc's. Note that if the
object_desc is empty, the unit_spec forms the object that is the
target of the message.
object_spec -> unit_spec object_desc
| '(' expression ')'
unit_spec -> unit_desc subunit_desc
unit_desc -> UNIT_CLASS UNIT_FUNCTION_ID unit_number
unit_number -> SSHORTINTEGER_CLASS SSHORTINTEGER_DATA
| E
subunit_desc -> subunit_desc' subunit_desc
| E
subunit_desc' -> SUBUNIT_CLASS SUBUNIT_FUNCTION_ID subunit_number
subunit_number -> SSHORTINTEGER_CLASS SSHORTINTEGER_DATA
| E
object_desc -> OBJECT_CLASS_NAME OBJECT_FUNCTION_ID
| E
E. A msg_spec can be one of three types of messages:
msg_spec -> KEYWORD_ID object_value
| ARITHMETIC_ID object_value
| SIMPLE_ID
The precedence of the three types of messages is as shown, ie. a
simple message has higher precedence than an arithmetic message
which has higher precedence than a keyword message. Unfortunately
my parser implementation doesn't handle this precedence correctly
without using parenthesis. Note that a simple message doesn't
take an argument (object_value). Also of importance is the fact
that the message tokens (KEYWORD_ID, ...) are mutually exclusive
with the tokens for the classes (SUBUNIT_CLASS, ...).
F. An object_value can be a discrete value or it could be a block
(which is really just a special type of object) or it could be a
subexpression. The data is just the actual value of an object, for
instance a short integer '3' is represented by SSHORTINTEGER 3 (of
course there is a unique code representing SSHORTINTEGER...).
object_value -> GENERAL_OBJECT_CLASS data
| block
| '(' expression ')'
QUESTIONS
1. Can this grammar be modified to include the precedence of messages
as outlined in section E above, or would it be better to just rewrite
the grammar from scratch?
2. Considering my hardware constraints, would it be prudent to hard-code
the parser as I have done to date, or should I rely on yacc or bison
generated code. (I played with yacc a bit and the output was pretty
unreadable...). I'm somewhat worried about future maintenance.
3. If I hard-code the parser, should I switch to one-of the table-driven
approaches instead of the recursive-descent approach. I found writing
a recursive-descent parser to be fairly simple and it's very easy to
understand and debug. I'm running it with a simple test program on my
workstation - haven't ported to target hardware yet so efficiency hasn't
been an issue, but that may become a concern. My parser does a top-down
parse - should I switch to a bottom-up approach?
4. What would be an appropriate way to represent a statement that has
been parsed correctly so that it may then be executed? In my parser
I currently create a tree structure that represents the statement.
Each node in the tree represents a non-terminal or terminal in my
grammar. Nodes with data (terminals) include the token and lexeme
representing that data. Nodes without data just include the token.
Would it be better to use some sort of stack structure, or do I need
an internal language? I need to be able to execute a statement
immediately upon completion of a parse. Given a statement with the
basic form shown above (target message argument) the internal execution
would proceed as follows: (1) look up target in object table - an
object must be created (in parser) if it doesn't already exist in the
table, (2) look up message code in method table for that type of object
- unless there is a "receiver" specified, in which case you look up the
method code in the receiver's method table, (3) look up argument in the
object table, (4) make the "procedure call" like target.message(argument).
That's it... To anyone who has taken the time to read this and anyone who
can give me a little feedback - I thank you.
Craig
--
Craig A. Heilman, craigh@cserver.plexus.com
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.