|Writing a recursive descent parser in C firstname.lastname@example.org (2001-11-29)|
|Re: Writing a recursive descent parser in C email@example.com (2001-12-03)|
|Re: Writing a recursive descent parser in C firstname.lastname@example.org (Bill Rayer) (2001-12-07)|
|Re: 4GL language design, was Writing a recursive descent parser in C email@example.com (2001-12-09)|
|Re: 4GL language design, was Writing a recursive descent parser in C firstname.lastname@example.org (2001-12-11)|
|Re: 4GL language design, was Writing a recursive descent parser in C email@example.com (Bill Rayer) (2001-12-11)|
|From:||"Bill Rayer" <firstname.lastname@example.org>|
|Date:||11 Dec 2001 21:31:13 -0500|
|Organization:||Virgin Net Usenet Service|
|References:||01-11-146 01-12-008 01-12-020 01-12-040|
|Keywords:||parse, design, comment|
|Posted-Date:||11 Dec 2001 21:31:13 EST|
> > I'm interested that some 4GLs mix up the scanning and parsing stages.
> > What 4GLs do you consider to be most deficient in this way? And what
> Most have been fortunately dropped from use, but a good example might
> be various flavors of Basic implemented in the 1970s for a range of
> minicomputers. The use of postfix type characters by older Basics and
> as implemented in these products is one confusion of the scanning and
> parsing phases because the handling of the postfix type operator
> belongs in no clear and decidable sense to neither the scanner or the
> parser. [snip]
I did read Kemeny & Kurtz's book "Back to Basic" and understood the
only type character they wanted was $ for string. They disapproved of
the large number of type characters used by other Basics (eg Microsoft
Quickbasic has 6 I can recall). But the type character was always part
of the i/d, it was never intended as a separate symbol.
I was interested in your comments about mixing scannning and parsing
because I'm reading the XML syntax (www.w3.org/TR/REC-xml). Putting
aside XML's merits, I was uneasy reading the syntax as I can't tell
whether it mixes the scanner and the parser or not! I'm used to
syntaxes that work on two levels - you define the tokens ("begin",
"end", identifier, signed_integer etc), then you define the syntax
that says how the tokens fit together (block ::= BEGIN statement ";"
END etc). The tokens are processed in the scanner and the syntax is
represented by the structure of recursive subroutines.
At this point I should add my compiler writing experience is limited
to recursive descent parsers in Pascal and Delphi. As was ably
explained at the start of this thread, it's easy to write a RDP if you
can define a language on two levels: (1) the tokens which are
definable using regular expressions and (2) the syntax using EBNF.
Given this information, the code follows naturally.
What bothers me with XML is having a separate production for space
(production ). I always thought if tokens are separated by
whitespace, an EBNF syntax never had to worry about spaces. But XML
specifies tags similar to:
'<' Name S? '>'
ie an opening pointy bracket followed immediately by a Name production
(similar to a normal identifier), followed by an optional space
production (S, one or more spaces) followed by a closing pointy
So by having a separate production for spaces, does XML mix up the
scanning and parsing stages? And does it matter if they do? I would be
interested in anyone's views on this, not least because I'm trying to
modify a parser to work with it!
[Parsing XML is indeed pretty yucky. -John]
Return to the
Search the comp.compilers archives again.