Related articles |
---|
Writing a recursive descent parser in C bilbo@volcanomail.com (2001-11-29) |
Re: Writing a recursive descent parser in C spinoza1111@yahoo.com (2001-12-03) |
Re: Writing a recursive descent parser in C lingolanguage@hotmail.com (Bill Rayer) (2001-12-07) |
Re: 4GL language design, was Writing a recursive descent parser in C spinoza1111@yahoo.com (2001-12-09) |
Re: 4GL language design, was Writing a recursive descent parser in C alexc@world.std.com (2001-12-11) |
Re: 4GL language design, was Writing a recursive descent parser in C lingolanguage@hotmail.com (Bill Rayer) (2001-12-11) |
From: | "Bill Rayer" <lingolanguage@hotmail.com> |
Newsgroups: | comp.compilers |
Date: | 11 Dec 2001 21:31:13 -0500 |
Organization: | Virgin Net Usenet Service |
References: | 01-11-146 01-12-008 01-12-020 01-12-040 |
Keywords: | parse, design, comment |
Posted-Date: | 11 Dec 2001 21:31:13 EST |
Dear Newsgroup
> > I'm interested that some 4GLs mix up the scanning and parsing stages.
> > What 4GLs do you consider to be most deficient in this way? And what
>
> Most have been fortunately dropped from use, but a good example might
> be various flavors of Basic implemented in the 1970s for a range of
> minicomputers. The use of postfix type characters by older Basics and
> as implemented in these products is one confusion of the scanning and
> parsing phases because the handling of the postfix type operator
> belongs in no clear and decidable sense to neither the scanner or the
> parser. [snip]
I did read Kemeny & Kurtz's book "Back to Basic" and understood the
only type character they wanted was $ for string. They disapproved of
the large number of type characters used by other Basics (eg Microsoft
Quickbasic has 6 I can recall). But the type character was always part
of the i/d, it was never intended as a separate symbol.
I was interested in your comments about mixing scannning and parsing
because I'm reading the XML syntax (www.w3.org/TR/REC-xml). Putting
aside XML's merits, I was uneasy reading the syntax as I can't tell
whether it mixes the scanner and the parser or not! I'm used to
syntaxes that work on two levels - you define the tokens ("begin",
"end", identifier, signed_integer etc), then you define the syntax
that says how the tokens fit together (block ::= BEGIN statement ";"
END etc). The tokens are processed in the scanner and the syntax is
represented by the structure of recursive subroutines.
At this point I should add my compiler writing experience is limited
to recursive descent parsers in Pascal and Delphi. As was ably
explained at the start of this thread, it's easy to write a RDP if you
can define a language on two levels: (1) the tokens which are
definable using regular expressions and (2) the syntax using EBNF.
Given this information, the code follows naturally.
What bothers me with XML is having a separate production for space
(production [3]). I always thought if tokens are separated by
whitespace, an EBNF syntax never had to worry about spaces. But XML
specifies tags similar to:
'<' Name S? '>'
ie an opening pointy bracket followed immediately by a Name production
(similar to a normal identifier), followed by an optional space
production (S, one or more spaces) followed by a closing pointy
bracket.
So by having a separate production for spaces, does XML mix up the
scanning and parsing stages? And does it matter if they do? I would be
interested in anyone's views on this, not least because I'm trying to
modify a parser to work with it!
Regards
Bill Rayer
[Parsing XML is indeed pretty yucky. -John]
Return to the
comp.compilers page.
Search the
comp.compilers archives again.