|[2 earlier articles]|
|Re: Compiler Books? Parsers? firstname.lastname@example.org (2003-11-01)|
|Re: Compiler Books? Parsers? Jeffrey.Kenton@comcast.net (Jeff Kenton) (2003-11-21)|
|Re: Compiler Books? Parsers? cfc@shell01.TheWorld.com (Chris F Clark) (2003-12-03)|
|Re: Compiler Books? Parsers? email@example.com (Marco van de Voort) (2003-12-20)|
|Re: Compiler Books? Parsers? firstname.lastname@example.org (Chris F Clark) (2003-12-21)|
|Re: Compiler Books? Parsers? email@example.com (Carl Cerecke) (2003-12-23)|
|Re: errors in Java programs, was Compiler Books? Parsers? firstname.lastname@example.org (Joachim Durchholz) (2003-12-27)|
|From:||Joachim Durchholz <email@example.com>|
|Date:||27 Dec 2003 14:12:46 -0500|
|Organization:||Oberberg Online Infosysteme|
|References:||03-10-113 03-10-145 03-11-010 03-11-083 03-12-017 03-12-116 03-12-125 03-12-132|
|Posted-Date:||27 Dec 2003 14:12:46 EST|
Carl Cerecke wrote:
> Chris F Clark wrote:
>>The second point on this topic, which I think I mentioned in another
>>thread, is that many (most to my mind) errors are actually sub-lexical
>>occuring at the single character level and not at the parsing level.
> As part of my recently completed PhD, I analysed about 200,000
> incorrect Java programs written by novice programmers. Nearly all are
> correct lexically.
Hmm... I think that mostly-lexically-correct property is correlated with
the lack of experience on the side of the programmers.
Once people get used to the language, syntax errors tend to vanish
almost entirely, and what remains are single-character typos which tend
to either mangle a name (generating a "name not declared" error, which
isn't in the domain of parsing), or damage a token so that it goes into
another token class (generating a lexical error, i.e. not a parse error
So I guess the kind of error handling needed depend largely on
>>Even most hand-written parsers use some form of separate lexer (which
>>is mostly context insensitive), so when I make the error of omitting
>>the closing quote from my character string, it swallows far too much
>>relevant text and the resulting error recovery isn't important,
>>because the basic problem the missing character is not in the parsers
>>purview at all.
> .....the most difficult errors to repair mostly fall into two related
> categories: comment delimiter problems, and string delimiter problems.
> Some novice programmers really have a problem remembering /* opens a
> comment, and */ closes a comment - often transposing one or the other or
> both. Suddenly, the parser is asked to make sense of the tokens <star>
> <slash> <ident> <ident> <ident> ... and gets rather confused. Seeing
> this has convinced me that it is better for the comment delimiters of a
> language to be single-character tokens that are not used for any other
> purpose. Also, notice how the above stream of tokens could look like a
> malformed expression to a parser attempting recovery.
Actually, in my learning days, I have seen a parser complain about a
missing expression between * and / in code like this:
a = */ now things get hairy */
It took me several seconds to figure out what was happening (but having
read about lexing and parsing, I knew where that error originated - I've
got no idea how a complete newbie to programming would have fared).
However, I think that the main problem here is that C uses the same
characters for the begin and end comment delimiters. If comments were
delimited using (* ... *), I'm pretty sure that any errors would stick
out to the view of the programmer, so even if he doesn't understand the
error message he'll quickly see what to fix.
Actually (*...*) is the Pascal convention, and I /never/ had problems
with comment delimiters when I learnt programming using Pascal. Of
course, that's just anecdotal evidence, a wider statistical base would
be beneficial to say anything with certainty.
Return to the
Search the comp.compilers archives again.