Re: Incomplete last line won't match

Tim Van Holder <tim.vanholder@falconsoft.be>
21 Jun 2001 03:12:29 -0400

          From comp.compilers

Related articles
Incomplete last line won't match HARVEYF1@WESTAT.com (Francis Harvey) (2001-06-14)
Re: Incomplete last line won't match tim.vanholder@falconsoft.be (Tim Van Holder) (2001-06-21)
Re: Incomplete last line won't match wzzhu@csis.hku.hk (Zhu Wenzhang) (2001-06-21)
| List of all articles for this month |

From: Tim Van Holder <tim.vanholder@falconsoft.be>
Newsgroups: comp.compilers
Date: 21 Jun 2001 03:12:29 -0400
Organization: Anubex N.V.
References: 01-06-032
Keywords: lex
Posted-Date: 21 Jun 2001 03:12:29 EDT

Francis Harvey wrote:
>
> Greetings,
>
> Using Flex 2.5.4 and Berkeley Yacc, I have finished a fairly complete
> program for analyzing the syntax and logic of a user's source file.
> Unfortunately, my program has one glaring flaw. If a user provides a
> source file whose last line does not have a newline character at the
> end, the parser will fail to recognize all of the rules that it should
> and the last line ends up getting echoed to my output file. All of my
> other rules and grammar will be fulfilled, but for the last line only
> the initial condition rule (.*\n?) will be matched and then no other.
> If I simply add a newline to the last line of the file, the code then
> works, but I can't seem to get this to work in my program using unput
> without getting a buffer error. Any suggestions are most appreciated.
>
> For example, imagine these are the last 3 lines of the source file
> where the last line does not have a newline character after the last
> letter (testing on the UNIX):
>
> V BL24C 10 A 24 0082-0091
> Q BLANK FIELD
> C ++++++++++ = INAPPLICABLE, ALWAYS BLANK
>
> The simplified applicable rules should be:
>
> .*\n? {
> BEGIN COED;
> yyless(0);
> }
Well, you could simply strip the newline here if it exists.
Or you could use '.*' (which, IIRC, will never match a newline), and
match a solitary newline with a 'skip this' rule.


>
> <COED>^V{ws} {
> BEGIN V;
> return VSYM;
> }
This seems a little strict; this would cause a line with a type like


    V blah blah


not to be matched (due to the excess space).
Also, the ^ seems unnecessary (it's implied by the COED state).


You could just have


<COED>V { BEGIN V; return VSYM; }
<COED>W { BEGIN W; return WSYM; }
<COED>X { BEGIN X; return XSYM; }
<COED>. { yyerror ("unsupported COED"); exit (1); } /* or whatever */


You could even merge the first three as


<COED>[VWX] { /* use a switch(yytext[0]) to decide what to do */ }


but I'm unsure which of the two approaches is more efficient.


> <V>{ident}{ws}{vstat} {
> BEGIN 0;
> return VSTAT;
> }


You're really doing part of byacc's work here - simply skip all
whitespace
and match {ident} and {vstat} separately (unless they couldn't otherwise
be
told apart).
byacc would then check syntax by


v: VSYM IDENT VSTAT ;


Of course, this all assumes that the 'one-space' requirement you use
isn't
strict; but even if it is, you could always defer the order check until
the parse phase:


.* { BEGIN COED; yyless (0); }


<COED>V { BEGIN V; return VSYM; }
<COED>Q { BEGIN Q; return QSYM; }


<V>{ident} { return IDENT; }
<V>{vstat} { BEGIN 0; return VSTAT; }
<Q>.+ { BEGIN 0; return QSTAT; }


<*>{ws} { return BLANK; }


\n { /* ignored - implied by the .* match-all rule above */ }


. { yyerror("parse error: unexpect character"); exit(1); }


byacc:


v: VSYM BLANK IDENT BLANK VSTAT ;
q: QSYM BLANK QSTAT ;


Of course, this is just typed off the top of my head - glaring errors
may be present and this code may cause a small country to explode.


--
Tim Van Holder - Anubex N.V.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.