# Re: A Low-Rent Syntax Problem

## ok@goanna.cs.rmit.OZ.AU (Richard A. O'Keefe)31 Aug 90 08:32:10 GMT

From comp.compilers

Related articles
A Low-Rent Syntax Problem mcdaniel@adi.com (1990-08-28)
Re: A Low-Rent Syntax Problem adamsf@turing.cs.rpi.edu (1990-08-30)
Re: A Low-Rent Syntax Problem ok@goanna.cs.rmit.OZ.AU (1990-08-31)
Re: A Low-Rent Syntax Problem brnstnd@kramden.acf.nyu.edu (1990-09-04)
| List of all articles for this month |

 Newsgroups: comp.compilers From: ok@goanna.cs.rmit.OZ.AU (Richard A. O'Keefe) Keywords: lex, parse, design Organization: Comp Sci, RMIT, Melbourne, Australia References: Date: 31 Aug 90 08:32:10 GMT

> But there are other syntactic structures in the language, and I'd like
> to use a lex- or flex-like lexer with a bison-like grammar. If I have
> just the non-terminals EOL (end of line), LPAREN, RPAREN, ASSIGN, and
> TEXT, I can't distinguish
> c = 12(a)45
> from
> c = 12 (a) 45
> because the lexer would return
> TEXT ASSIGN TEXT LPAREN TEXT RPAREN TEXT EOL
> in both cases.

Frankly, I think this is more than somewhat ugly. Having used SNOBOL,
it is obvious to me that "c = 12 (a) 45" is concatenating the strings
"12", the value of a, and "45". AWK uses the SNOBOL convention here;
try the AWK program
BEGIN { a = "--" }
END { print 12 (a) 45 }
so a lot of UNIX hackers may be very surprised by your syntax.

If you want to distingiush "12(a)45" from "12 (a) 45", surely the
simplest way is to make the brackets different:
/{LAYOUT}(/ --> LEFT_PLAIN
/){LAYOUT}/ --> RIGHT_PLAIN
/(/ --> LEFT_CONCAT
/)/ --> RIGHT_CONCAT
I haven't bothered to check how exactly you would say this in Lex, but
the "longest-match" rule would make it work. You would hallucinate a
newline at the beginning of the file and treat newline as layout, so
that
(c) = 12
would be tokenised as LEFT_PLAIN TEXT RIGHT_PLAIN ASSIGN TEXT EOL

> ? Does it continue the line? Does it continue the comment?
> (The latter is gross:
> a = 1 2 3 4 // comment\
> 5 6 7 8
> would silently comment out the second line!)

The Bourne shell equivalent of this (use # instead of //) _neither_
continues the line _nor_ continues the comment (the \ is swallowed by
the comment). The ANSI C rule about \<newline> is that those two
characters disappear very early in processing, my interpretation is
that if \\ were added to ANSI C it would have to comment out the next
line when used like this.

Surely the simplest rule would be to make your end of line comment
marker a single character (and it would be consistent with most UNIX
tools if that character were '#') and then you could easily handle
\#<comment><newline> as if it were \<newline>.
--

Post a followup to this message