a generated parser/scanner for fixed form FORTRAN

Evangelos Drikos <drikosev@otenet.gr>
Wed, 17 Jul 2013 17:55:22 +0300

          From comp.compilers

Related articles
a generated parser/scanner for fixed form FORTRAN drikosev@otenet.gr (Evangelos Drikos) (2013-07-17)
| List of all articles for this month |
From: Evangelos Drikos <drikosev@otenet.gr>
Newsgroups: comp.compilers
Date: Wed, 17 Jul 2013 17:55:22 +0300
Organization: An OTEnet S.A. customer
Keywords: parse, Fortran, LALR
Posted-Date: 17 Jul 2013 14:07:26 EDT

Hello,


The parser/scanner generator "Syntaxis.jar" supports a new feature that
enables a LALR' parser and a generated scanner to parse fixed form
FORTRAN programs and I think that this feature is not FORTRAN specific.


Below, I describe in detail the lexical issues and thereafter the
grammar modifications required in a FORTRAN 2008 free source form LALR'
parser/scanner to parse also fixed form programs. The lexical issues
I've identified are grouped into three categories:


1) Statements that don't have delimiters (e.g. GO TO label, STOP code).
2) Keywords followed by another keyword or a name before a delimiter.
        2.1) before '=' (NON; e.g. NON INTRINSIC and NON OVERRIDABLE).
        2.2) before '(' (SYNC,IMPLICIT,SUBROUTINE,FUNCTION,DO,CALL,prefix)
                                        (GO, END_INTERFACE,DIMENSION,COMMON,DATA,ENTRY).
        2.3) before '%' (CALL or DATA; e.g. CALL a%b )
        2.4) before '[' (CODIMENSION; e.g. CODIMENSION a[*] )
3) Special cases.
        3.1) The keyword FORMAT (an issue in both fixed & free form).
        3.2) The well known DO issue (DO [label] name=exp,exp).
        3.3) An integer before a binary-defined-operator.
        3.4) An entity declaration that looks like a function statement.
        3.5) Hollerith Constants (FORTRAN 77).


At first, we modify the grammar to distinguish the names used at the
beginning of a statement (name-l) from the names used elsewhere (name-l
or name-r). Then we can solve the lexical issues per category:


1) A name-l must be followed by ':','(','%','=','['.
2) For the concatenated keywords/names before '=', '(', '[',and '%' we:
        2.1) accept optional spacing; it also needs a semantic action.
        2.2-2.4) scan for "]=","%name=", ")="; if not found it is a name-r.
3) For each special case:
        3.1) We scan for "]=", "%name=", or ")="; if found it is a name-l.
        3.2) If a name-l like "DO*" is followed by "=exp," it must be further
        followed by "name=". As expressions are not regular we set a limit of
        four levels of nested parentheses for the first expression of the
        "loop-control". If the limit is exceeded the lexer returns a name-l.
        3.3) A real-literal-constant cannot be followed by a letter or a dot.
        3.4) We use a token name-f that begins with FUNCTION/SUBROUTINE and
        is followed by '('name-list/dummy-arg-list')'. If the parser can
        not shift the keyword FUNCTION, we return a name-l (semantic action).
        3.5) We parse Hollerith Constants in the hand coded file reader.


The new feature mentioned above is that the lexer can return a shorter
match as an alternative token. As the parser cannot shift a name-r at
the beginning of a statement it can optionally request a shorter match.


To validate the solution, I've extended the grammar with three obsolete
FORTRAN 77 statements (ASSIGNED GOTO, ASSIGN,and PAUSE) and tested it
with the programs found at: www.itl.nist.gov/div89/ctg/fortran_form.htm


Clearly, the table driven scanner has some disadvantages but one can
mechanically translate it into a hard coded optimized scanner.


Regards,
Ev. Drikos


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.