Related articles |
---|
Languages with optional spaces maury.markowitz@gmail.com (Maury Markowitz) (2020-02-19) |
Re: Languages with optional spaces awanderin@gmail.com (Jerry) (2020-02-20) |
Re: Languages with optional spaces drikosev@gmail.com (Ev. Drikos) (2020-02-23) |
Re: Languages with optional spaces maury.markowitz@gmail.com (Maury Markowitz) (2020-02-25) |
Re: Languages with optional spaces maury.markowitz@gmail.com (Maury Markowitz) (2020-02-25) |
Re: Languages with optional spaces martin@gkc.org.uk (Martin Ward) (2020-02-25) |
Re: Languages with optional spaces 493-878-3164@kylheku.com (Kaz Kylheku) (2020-02-26) |
Re: Languages with optional spaces awanderin@gmail.com (awanderin) (2020-02-26) |
Re: Languages with optional spaces drikosev@gmail.com (Ev. Drikos) (2020-02-28) |
Re: Languages with optional spaces christopher.f.clark@compiler-resources.com (Christopher F Clark) (2020-02-29) |
Re: Languages with optional spaces drikosev@gmail.com (Ev. Drikos) (2020-02-29) |
Re: Languages with optional spaces DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2020-03-01) |
Re: Languages with optional spaces christopher.f.clark@compiler-resources.com (Christopher F Clark) (2020-03-01) |
[8 later articles] |
From: | Kaz Kylheku <493-878-3164@kylheku.com> |
Newsgroups: | comp.compilers |
Date: | Wed, 26 Feb 2020 08:06:04 +0000 (UTC) |
Organization: | Aioe.org NNTP Server |
References: | 20-02-015 |
Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="4321"; mail-complaints-to="abuse@iecc.com" |
Keywords: | lex, Basic, history |
Posted-Date: | 27 Feb 2020 17:33:44 EST |
On 2020-02-19, Maury Markowitz <maury.markowitz@gmail.com> wrote:
> I'm trying to write a lex/yacc (flex/bison) interpreter for classic BASICs
> like the original DEC/MS, HP/DG etc. I have it mostly working for a good chunk
> of 101 BASIC Games (DEF FN is the last feature to add).
>
> Then I got to Super Star Trek. To save memory, SST removes most spaces, so
> lines look like this:
>
> 100FORI=1TO10
>
> Here's my current patterns that match bits of this line:
>
> FOR { return FOR; }
>
> [:,;()\^=+\-*/\<\>] { return yytext[0]; }
>
> [0-9]*[0-9.][0-9]*([Ee][-+]?[0-9]+)? {
> yylval.d = atof(yytext);
> return NUMBER;
> }
>
> "FN"?[A-Za-z@][A-Za-z0-9_]*[\$%\!#]? {
> yylval.s = g_string_new(yytext);
> return IDENTIFIER;
> }
>
> These correctly pick out some parts, numbers and = for instance, so it sees:
>
> 100 FORI = 1 TO 10
>
> The problem is that FORI part. Some BASICs allow variable names with more than
> two characters, so in theory, FORI could be a variable. These BASICs outlaw
> that in their parsers; any string that starts with a keyword exits then, so
> this would always parse as FOR. In lex, FORI is longer than FOR, so it returns
> a variable token called FORI.
>
> Is there a way to represent this in lex? Over on Stack Overflow the only
> suggestion seemed to be to use trailing syntax on the keywords, but that
> appears to require modifying every one of simple patterns for keywords with
> some extra (and ugly) syntax. Likewise, one might modify the variable name
> pattern, but I'm not sure how one says "everything that doesn't start with one
> of these other 110 patterns".
Two ideas:
1. Just forget recognizing variable names in the lexer. Instead,
recognize only the constituent letter of a variable name in the lexer.
Then in the parser, have a grammar production which converts
the letters of a variable into a variable.
variable : VARCHAR
| variable VARCHAR
;
2. Use regex patterns in the lexer to recognize just the keywords,
as a above. Then, recognition of variable names is handled by
matching just one letter A-Z, whose lex action performs ad-hoc
lexical analysis using C logic. At that point you know that you do not
have a keyword, because no keyword rule matched. You can read
characters using YYIN and accumulate a variable name.
A variant of technique (2) is used for scanning C comments,
as an alternative to an ugly regular expression:
"/*" {
int c;
while ((c = yyinput()) != 0)
{
if (c == '\n') {
/* increment line number or something */
}
else if (c == '*')
{
if ((c = yyinput()) == '/')
break;
else
unput(c);
}
}
}
The above is an adaptation of something from an old Flex manual.
IIRC the Dragon Book has a similar example of ad-hoc logic
in a lex rule for handling C comments.
You can see that it's a similar idea. We use a regex to partially match
the comment, just the /* opening. Then we take over from there.
I have a hunch this would work for fetching variables like FORI, when
there is no match on a keyword like FOR.
--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1
Return to the
comp.compilers page.
Search the
comp.compilers archives again.