A Low-Rent Syntax Problem

mcdaniel@adi.com (Tim McDaniel)
28 Aug 90 18:46:47 GMT

From comp.compilers

Related articles
*A Low-Rent Syntax Problem mcdaniel@adi.com* (1990-08-28)**
Re: A Low-Rent Syntax Problem adamsf@turing.cs.rpi.edu (1990-08-30)
Re: A Low-Rent Syntax Problem ok@goanna.cs.rmit.OZ.AU (1990-08-31)
Re: A Low-Rent Syntax Problem brnstnd@kramden.acf.nyu.edu (1990-09-04)

| List of all articles for this month |

Newsgroups:	comp.compilers
From:	mcdaniel@adi.com (Tim McDaniel)
Keywords:	lex, parse, design
Organization:	Applied Dynamics International, Inc.; Ann Arbor, Michigan, USA
Date:	28 Aug 90 18:46:47 GMT

The idea of "low-rent syntax", avoid semicolons and other "noise
tokens", sounds appealing, but it turns out to be difficult in
practice. I'm having trouble working out some issues.

I'm designing a simple language, much like a simple shell. The
language has assignment statements and the datatype "list of strings".

The syntax I'd like is, e. g.
a = 3
b = 1 2 (a) 4 5
"a" has a one-element value, "3", while "b" has 5 strings in its
value: "1", "2", "3", "4", "5". Whitespace separates words, like in
REXX or a UNIX shell.

I'd like to provide string concatenation. The syntax I like is just
abutment without whitespace in between: in
c = 12(a)45
"c"'s value would be a one-element list, with "12345" as the only
element. The Bourne-shell analogue is
c=12${a}45

But there are other syntactic structures in the language, and I'd like
to use a lex- or flex-like lexer with a bison-like grammar. If I have
just the non-terminals EOL (end of line), LPAREN, RPAREN, ASSIGN, and
TEXT, I can't distinguish
c = 12(a)45
from
c = 12 (a) 45
because the lexer would return
TEXT ASSIGN TEXT LPAREN TEXT RPAREN TEXT EOL
in both cases.

Here are my ideas:

- The lexer returns WS for whitespace. However, my grammar would get
    WS cropping up all over, as in
null ::=
ws ::= null | ws WS
word ::= TEXT | LPAREN TEXT RPAREN
list ::= word | list ws word
                assignment ::= TEXT ws ASSIGN ws list ws EOL
    This seems ugly.

- The lexer returns CONCAT as the "implicit concatenation" operator.
    If the previous token and the next one are not separated by
    whitespace, return CONCAT as the current token, and return the next
    source token at the next call instead. This seems kludgy.

    A problem is that the lexer may not be able to tell if the next
    token would be whitespace. In the statement
a = b//comment stuff
    after returning TEXT for "b", the lexer can only see "/" -- or can a
    flex lexer have more lookahead? It doesn't know that it's a
    start-of-comment. One workaround is to force a comment-start to be
    surrounded by whitespace:
a = b // comment stuff
    which is better-looking anyway.

- The user has to explicitly enter a concatenation operator.
    Unfortunately, that makes one more character "special" and makes it
    have to be quoted. It also clutters the statement:
c = 12:(a):45
    or
c = 12 : (a) : 45
    looks messier.

- Using a hand-coded lexer or parser is not an option in my shop, alas.

Another problem I've had with low-rent syntax is how to tell the
lexer/parser to continue a source line.

Two approaches:

- Use "&" or "," or some other character as a "continue this line"
    indicator:
a = 1 2 3 4 5 & // comment text
6 7 8 9 10 11
    But that makes another character be special, and I'd like to avoid
    that, because I'd like to make "&" and "," available for future use.

- "\" followed by newline is removed, as in C. However what does this
    mean?
a = 1 2 3 \// comment
    ? Or what about
a = 1 2 3 // comment\
    ? Does it continue the line? Does it continue the comment?
    (The latter is gross:
a = 1 2 3 4 // comment\
5 6 7 8
    would silently comment out the second line!)
    Don't allow a line with a line comment to be continued?

Any other ideas? Which looks best?

Low-rent syntax is a nice idea, but it's got some subtle problems in
certain cases, no?

--
Tim McDaniel Applied Dynamics Int'l.; Ann Arbor, Michigan, USA
Work phone: +1 313 973 1300 Home phone: +1 313 677 4386
Internet: mcdaniel@adi.com UUCP: {uunet,sharkey}!amara!mcdaniel
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

A Low-Rent Syntax Problem

mcdaniel@adi.com (Tim McDaniel)28 Aug 90 18:46:47 GMT

mcdaniel@adi.com (Tim McDaniel)
28 Aug 90 18:46:47 GMT