Related articles |
---|
Incomplete last line won't match HARVEYF1@WESTAT.com (Francis Harvey) (2001-06-14) |
Re: Incomplete last line won't match tim.vanholder@falconsoft.be (Tim Van Holder) (2001-06-21) |
Re: Incomplete last line won't match wzzhu@csis.hku.hk (Zhu Wenzhang) (2001-06-21) |
From: | Tim Van Holder <tim.vanholder@falconsoft.be> |
Newsgroups: | comp.compilers |
Date: | 21 Jun 2001 03:12:29 -0400 |
Organization: | Anubex N.V. |
References: | 01-06-032 |
Keywords: | lex |
Posted-Date: | 21 Jun 2001 03:12:29 EDT |
Francis Harvey wrote:
>
> Greetings,
>
> Using Flex 2.5.4 and Berkeley Yacc, I have finished a fairly complete
> program for analyzing the syntax and logic of a user's source file.
> Unfortunately, my program has one glaring flaw. If a user provides a
> source file whose last line does not have a newline character at the
> end, the parser will fail to recognize all of the rules that it should
> and the last line ends up getting echoed to my output file. All of my
> other rules and grammar will be fulfilled, but for the last line only
> the initial condition rule (.*\n?) will be matched and then no other.
> If I simply add a newline to the last line of the file, the code then
> works, but I can't seem to get this to work in my program using unput
> without getting a buffer error. Any suggestions are most appreciated.
>
> For example, imagine these are the last 3 lines of the source file
> where the last line does not have a newline character after the last
> letter (testing on the UNIX):
>
> V BL24C 10 A 24 0082-0091
> Q BLANK FIELD
> C ++++++++++ = INAPPLICABLE, ALWAYS BLANK
>
> The simplified applicable rules should be:
>
> .*\n? {
> BEGIN COED;
> yyless(0);
> }
Well, you could simply strip the newline here if it exists.
Or you could use '.*' (which, IIRC, will never match a newline), and
match a solitary newline with a 'skip this' rule.
>
> <COED>^V{ws} {
> BEGIN V;
> return VSYM;
> }
This seems a little strict; this would cause a line with a type like
V blah blah
not to be matched (due to the excess space).
Also, the ^ seems unnecessary (it's implied by the COED state).
You could just have
<COED>V { BEGIN V; return VSYM; }
<COED>W { BEGIN W; return WSYM; }
<COED>X { BEGIN X; return XSYM; }
<COED>. { yyerror ("unsupported COED"); exit (1); } /* or whatever */
You could even merge the first three as
<COED>[VWX] { /* use a switch(yytext[0]) to decide what to do */ }
but I'm unsure which of the two approaches is more efficient.
> <V>{ident}{ws}{vstat} {
> BEGIN 0;
> return VSTAT;
> }
You're really doing part of byacc's work here - simply skip all
whitespace
and match {ident} and {vstat} separately (unless they couldn't otherwise
be
told apart).
byacc would then check syntax by
v: VSYM IDENT VSTAT ;
Of course, this all assumes that the 'one-space' requirement you use
isn't
strict; but even if it is, you could always defer the order check until
the parse phase:
.* { BEGIN COED; yyless (0); }
<COED>V { BEGIN V; return VSYM; }
<COED>Q { BEGIN Q; return QSYM; }
<V>{ident} { return IDENT; }
<V>{vstat} { BEGIN 0; return VSTAT; }
<Q>.+ { BEGIN 0; return QSTAT; }
<*>{ws} { return BLANK; }
\n { /* ignored - implied by the .* match-all rule above */ }
. { yyerror("parse error: unexpect character"); exit(1); }
byacc:
v: VSYM BLANK IDENT BLANK VSTAT ;
q: QSYM BLANK QSTAT ;
Of course, this is just typed off the top of my head - glaring errors
may be present and this code may cause a small country to explode.
--
Tim Van Holder - Anubex N.V.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.