From: | Pascal Bourguignon <pjb@informatimago.com> |
Newsgroups: | comp.compilers |
Date: | 10 Sep 2006 09:54:55 -0400 |
Organization: | Informatimago |
References: | 06-09-029 |
Keywords: | parse, practice, Basic |
Posted-Date: | 10 Sep 2006 09:54:55 EDT |
"Mr.E" <mr.waverlye@verizon.net> writes:
> I recently got my scanner working for my [first] compiler. My
> compiler is for an existing BASIC language. I am seeing the
> complexity of the language at the parsing level. There are
> approximately 600 keyword or keyword combinations which include
> compound words due to intrinsic functions.
>
> Is there a parser generator that produces the equivalent of a
> hand-coded recursive descent parser? I'm looking for a generator that
> doesn't require an engine and doesn't use external libraries... just
> plain old C.
>
> I can watch and debug recursive descent code because I can understand
> that. I cant imagine trying to debug a table driven parser or having
> to rewrite it in BASIC.
>
> The reason for my request is that my compiler will be written in a
> BASIC dialect instead of C. I would generate the parser in ANSI C
> then rewrite it in BASIC. I'm using a BASIC compiler to bootstrap my
> own. It seems to be a good idea to write the compiler in the language
> its going to compileso that's what I'm doing.
Unless you're trapped on a planet light years away, with nothing else
than a BASIC, I see no reason to use that to bootstrap a compiler.
Even if you have to compile the compiler to BASIC for some strange
reason, I see no reason why you shouldn't use more powerful tools to
generate the BASIC code...
So, here is a simple Recursive Descent Parser generator, written in
Common Lisp, that can generate the parser in lisp, or in a
pseudo-basic. You could modify it to generate the code you want, for
your target basic system.
http://www.informatimago.com/develop/lisp/small-cl-pgms/rdp/
For the scanner, it uses a regexp library with a simplistic algorithm
(trying the regexp for each token in turn.
If speed was needed, a smarter algorithm using a fusionned DFA for the
scanner would be in order, and of course, a table based parser too.
> Also are there any algorithms for AST building. Everything I've
> understand tells me that I really want to build an AST and do code
> generation from it versus trying to generate code as I go along. I
> thought I understood the process but I'm not there yet.
There's no specific algorith to build the AST. It's merely built by
the actions associated to the grammar rules, and executed by the
parser when the rule is used.
The reason why it's interesting to build this AST as an intermediate
data structure, is that apart from the most simple cases, some
semantic analysis, and some optimizations are needed, for which we
need a global view of the AST.
For example to check that the declaredtype of the variables matches
the type of the operators with which the variables are used, or to to
type inference.
The remaining phases can be implemented as progressive transformations
of the AST into less and less abstract trees until what remains is a
list of target processor instructions.
Here is an example of the grammar written for my simple recursive
descent parser.
;;; Example taken from: http://en.wikipedia.org/wiki/Recursive_descent_parser
(defgrammar example
:terminals ((ident "[A-Za-z][A-Za-z0-9]*")
;; real must come first to match the longest first.
(real "^\\([-+]\\?[0-9]\\+\\.[0-9]\\+\\([Ee][-+]\\?[0-9]\\+\\)\\?\\)")
(integer "[-+]\\?[0-9]\\+"))
:start program
:rules ((--> factor
(alt ident
number
(seq "(" expression ")" :action $2))
:action $1)
(--> number (alt integer real) :action $1)
(--> term
factor (rep (alt "*" "/") factor)
:action `(,$1 . ,$2))
(--> expression
(opt (alt "+" "-"))
term
(rep (alt "+" "-") term :action `(,$1 ,$2))
:action `(+ ,(if $1 `(,$1 ,$2) $2) . ,$3))
(--> condition
(alt (seq "odd" expression
:action `(oddp ,$2))
(seq expression
(alt "=" "#" "<" "<=" ">" ">=")
expression
:action `(,$2 ,$1 ,$3)))
:action $1)
(--> statement
(opt (alt (seq ident ":=" expression
:action `(setf ,$1 ,$3))
(seq "call" ident
:action `(call ,$2))
(seq "begin" statement
(rep ";" statement
:action $2)
"end"
:action `(,$2 . ,$3))
(seq "if" condition "then" statement
:action `(if ,$2 ,$4))
(seq "while" condition "do" statement
:action `(while ,$2 ,$4))))
:action $1)
(--> block
(opt "const" ident "=" number
(rep "," ident "=" number
:action `(,$2 ,$4))
";"
:action `((,$2 ,$4) . ,$5))
(opt "var" ident
(rep "," ident :action $2)
";"
:action `(,$2 . ,$3))
(rep "procedure" ident ";" block ";"
:action `(procedure ,$2 ,$4))
statement
:action `(block ,$1 ,$2 ,$3 ,$4))
(--> program
block "." :action $1)))
The forms following the :ACTION keyword build the AST nodes from the
nodes returned by the subnodes identified by $1, $2, etc...
And here is an example of (lisp) AST built by the generated parser;
(parse-example
"
const abc = 123,
pi=3.141592e+0;
var a,b,c;
procedure gcd;
begin
while a # b do
begin
if a<b then b:=b-a ;
if a>b then a:=a-b
end
end;
begin
a:=42;
b:=30.0;
call gcd
end.")
-->
(BLOCK
(((IDENT "abc" 11) (INTEGER "123" 17))
((IDENT "pi" 32) (REAL "3.141592e+0" 35)))
((IDENT "a" 57) (IDENT "b" 59) (IDENT "c" 61))
((PROCEDURE (IDENT "gcd" 79)
(BLOCK NIL NIL NIL
((WHILE (("#" "#" 112) (+ ((IDENT "a" 110))) (+ ((IDENT "b" 114))))
((IF (("<" "<" 151) (+ ((IDENT "a" 150))) (+ ((IDENT "b" 152))))
(SETF (IDENT "b" 159)
(+ ((IDENT "b" 162)) (("-" "-" 163) ((IDENT "a" 164))))))
(IF ((">" ">" 186) (+ ((IDENT "a" 185))) (+ ((IDENT "b" 187))))
(SETF (IDENT "a" 194)
(+ ((IDENT "a" 197)) (("-" "-" 198) ((IDENT "b" 199))))))))))))
((SETF (IDENT "a" 235) (+ ((INTEGER "42" 238))))
(SETF (IDENT "b" 246) (+ ((REAL "30.0" 249)))) (CALL (IDENT "gcd" 264))))
(The integers in third position in the sublists are the positions in
the source of the corresponding token.)
--
__Pascal Bourguignon__ http://www.informatimago.com/
Return to the
comp.compilers page.
Search the
comp.compilers archives again.