Related articles |
---|
Re: Programming language and IDE design martin@gkc.org.uk (Martin Ward) (2013-11-16) |
Re: Programming language and IDE design gneuner2@comcast.net (George Neuner) (2013-11-18) |
Re: Programming language and IDE design hu47121@usenet.kitty.sub.org (2014-03-02) |
Re: LL vs LR, was Programming language and IDE design ivan@ootbcomp.com (Ivan Godard) (2014-03-01) |
Re: LL vs LR, was Programming language and IDE design kaz@kylheku.com (Kaz Kylheku) (2014-03-02) |
From: | Kaz Kylheku <kaz@kylheku.com> |
Newsgroups: | comp.compilers |
Date: | Sun, 2 Mar 2014 18:41:00 +0000 (UTC) |
Organization: | Aioe.org NNTP Server |
References: | 13-11-016 13-11-018 14-03-003 14-03-004 |
Keywords: | C, parse, tools |
Posted-Date: | 03 Mar 2014 06:26:49 EST |
On 2014-03-02, Ivan Godard <ivan@ootbcomp.com> wrote:
> On 3/1/2014 4:03 PM, Hannah wrote:
><snip>
>
>> So the C grammar is not LL(k) for any k, either. You can probably parse
>> that particular part of the C/C++ grammar easily using LR.
>>
>> Hannah.
>
> The standard way to handle this, and many such other cases where a
> formal grammar is not LL, is to introduce a function (in recursive
> descent) or rule (in parser generators) that recognizes "parenthesized
> thingy" and then figure it out later. Yes, that amounts to LR, but you
> only need it in a few specialized places. There's an equivalent trick
> in "DOI=1," in Fortran, where you let the lexer resolve it so the
> parser sees proper lexemes, not characters.
You can parse "anything" with LL(k) with backtracking. What you do is
you try various different parses, in a carefully designed order, until
something matches, with backtracking.
> [C++ is unusually bad, with some parts just plain ambiguous. In the example
> a few messages back, I gather that the rule is that if you can parse it as a
> declaration, it's a declaration, otherwise it's something else. -John]
Years ago (1999) I wrote a parser for C expression syntax (only the expression
syntax) as part of a module which could diagnose, at run-time, unsafe uses of
a C macro that contains multiple evaluations:
The source code is here:
http://git.savannah.gnu.org/cgit/kazlib.git/tree/sfx.c
E.g. dumb example:
#define INCR(A) (SFX_CHECK(A) = (A) + 1)
This SFX_CHECK (actually KAZLIB_SFX_CHECK, for hygiene) expands to code which
produces the value of (A), but also hands the expression "A" to a run-time
checker which answers the question, does the C expression "A" have side
effects?
The answer to the question is "yes, no, or possibly".
(Also, the expression "A" is put into a hash table which caches the
result for next time, so the checking is faster, when turned on.)
If the answer is other than no, then a diagnostic is printed, with the
expression "A", and file name and line number where that macro was
expanded. The diagnostic tells you whether there is a confirmed problem
in that line or just suspicion.
The parser has way to know how a symbol is declared, and so it uses
backtracking techniques.
In fact, an exception library based on setjmp/longjmp is used for some
backtracking returns. There is logic like, "if parsing it this way throws a
syntax error, try it this way".
For instance, what is "(a)(b)"? This could be that function a is being
called with argument b. Or it could be that a is a type, and b is being
cast to that type.
I remember parts of this being somewhat tough to write, but it looks fairly
transparent to me today. That was about a year before I discovered Lisp.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.