Related articles |
---|
Is Fortran90 LL(1)? rutzinger@takefive.co.at (Christian Rutzinger) (1996-04-04) |
Re: Is Fortran90 LL(1)? mjohnson@samson.tx.hac.com (1996-04-08) |
Is Fortran90 LL(1)? dave@occl-cam.demon.co.uk (Dave Lloyd) (1996-04-08) |
Re: Is Fortran90 LL(1)? dlmoore@ix.netcom.com (1996-04-16) |
Re: Is Fortran90 LL(1)? Robert.Corbett@Eng.Sun.COM (1996-04-18) |
From: | dlmoore@ix.netcom.com (David L Moore) |
Newsgroups: | comp.compilers |
Date: | 16 Apr 1996 22:33:52 -0400 |
Organization: | Netcom |
References: | 96-04-024 |
Keywords: | parse, Fortran |
> I want to write a Recursive Descent Parser for Fortran90.
>
> After taking a look on the Fortran90 Standard, I recognized at least
> one problem. The Fortran code can start with a program statement or a
> function (subroutine) statement. As both can begin with "INTEGER",
> "TYPE", "REAL", "DOUBLE" etc. (when you do not write the "PROGRAM"
> keyword in the program statement) there is an LL(1) conflict.
Arthur Sale had a paper in the Computer Journal (the journal of the
British Computer Society) which showed how to recognise statement
types back in the early seventies. This was for Fortran 66. I believe
my copy of the paper is now in a galaxy far far away. However. as I
recall, you can (with a suitable waving of hands) build a finite state
automaton in which many characters are "uninteresting" and thrown away
while others make transitions which eventually end up in an accept
state which identifies the statement kind.
Some examples of difficult statements.
10 FORMAT(I3) = 10
10 FORMAT(I3)
The first of these is an assignment, the second a format
statement. One compiler I used used to treat them both as format
statements and issue the warning "garbage after closing right
parenthesis" for statement one.
DO 10 I=1.3
DO 10 I=1,3
The first of these is an assignment statement the second a DO
header. Keypunch operators could never tell the difference either.
EQUIVALENCE (A,B)
EQUIVALENCE (A,B) = A.EQ.B
(for compilers that allow longer than 6 character names). The first of
these is an equivalence statement, the second a statement function.
By the way. Strictly speaking LL(1) and recursive descent accept
slightly different grammars because in Recursive Descent you can
usually delay making a decision about what production you have until
part way through the right hand side. As a result, Recursive descent
will work with grammars that need to be left factored to get them to
be LL. Putting the grammar into LL form before implementing it with a
recursive descent parser will increase its size and reduce its speed.
[To parse Fortran, you have to tell whether a statement is an
assignment (or statement function) or something else. First, if you
accept the old 3Hfoo Hollerith constants, you strip them out, being
careful not to be confused by REAL*4HELLO. Then you look for an equal
sign not protected by parentheses, and not followed by a comma which
also must not be protected by parentheses. If you find the equal
sign, and no comma, it's an assignment or a statement function. If
not, it's something else. Once you've decided that, the lexing and
parsing are pretty straightforward, with the parser at each stage
having to tell the lexer what kind of tokens to look for. See my
sample Fortran subset parser in the archives for an example of all
this nonsense. -John]
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.