Re: LL vs LR, was Programming language and IDE design

Kaz Kylheku <kaz@kylheku.com>
Sun, 2 Mar 2014 18:41:00 +0000 (UTC)

          From comp.compilers

Related articles
Re: Programming language and IDE design martin@gkc.org.uk (Martin Ward) (2013-11-16)
Re: Programming language and IDE design gneuner2@comcast.net (George Neuner) (2013-11-18)
Re: Programming language and IDE design hu47121@usenet.kitty.sub.org (2014-03-02)
Re: LL vs LR, was Programming language and IDE design ivan@ootbcomp.com (Ivan Godard) (2014-03-01)
Re: LL vs LR, was Programming language and IDE design kaz@kylheku.com (Kaz Kylheku) (2014-03-02)
| List of all articles for this month |
From: Kaz Kylheku <kaz@kylheku.com>
Newsgroups: comp.compilers
Date: Sun, 2 Mar 2014 18:41:00 +0000 (UTC)
Organization: Aioe.org NNTP Server
References: 13-11-016 13-11-018 14-03-003 14-03-004
Keywords: C, parse, tools
Posted-Date: 03 Mar 2014 06:26:49 EST

On 2014-03-02, Ivan Godard <ivan@ootbcomp.com> wrote:
> On 3/1/2014 4:03 PM, Hannah wrote:
><snip>
>
>> So the C grammar is not LL(k) for any k, either. You can probably parse
>> that particular part of the C/C++ grammar easily using LR.
>>
>> Hannah.
>
> The standard way to handle this, and many such other cases where a
> formal grammar is not LL, is to introduce a function (in recursive
> descent) or rule (in parser generators) that recognizes "parenthesized
> thingy" and then figure it out later. Yes, that amounts to LR, but you
> only need it in a few specialized places. There's an equivalent trick
> in "DOI=1," in Fortran, where you let the lexer resolve it so the
> parser sees proper lexemes, not characters.


You can parse "anything" with LL(k) with backtracking. What you do is
you try various different parses, in a carefully designed order, until
something matches, with backtracking.


> [C++ is unusually bad, with some parts just plain ambiguous. In the example
> a few messages back, I gather that the rule is that if you can parse it as a
> declaration, it's a declaration, otherwise it's something else. -John]


Years ago (1999) I wrote a parser for C expression syntax (only the expression
syntax) as part of a module which could diagnose, at run-time, unsafe uses of
a C macro that contains multiple evaluations:


The source code is here:


http://git.savannah.gnu.org/cgit/kazlib.git/tree/sfx.c


E.g. dumb example:


    #define INCR(A) (SFX_CHECK(A) = (A) + 1)


This SFX_CHECK (actually KAZLIB_SFX_CHECK, for hygiene) expands to code which
produces the value of (A), but also hands the expression "A" to a run-time
checker which answers the question, does the C expression "A" have side
effects?


The answer to the question is "yes, no, or possibly".


(Also, the expression "A" is put into a hash table which caches the
result for next time, so the checking is faster, when turned on.)


If the answer is other than no, then a diagnostic is printed, with the
expression "A", and file name and line number where that macro was
expanded. The diagnostic tells you whether there is a confirmed problem
in that line or just suspicion.


The parser has way to know how a symbol is declared, and so it uses
backtracking techniques.


In fact, an exception library based on setjmp/longjmp is used for some
backtracking returns. There is logic like, "if parsing it this way throws a
syntax error, try it this way".


For instance, what is "(a)(b)"? This could be that function a is being
called with argument b. Or it could be that a is a type, and b is being
cast to that type.


I remember parts of this being somewhat tough to write, but it looks fairly
transparent to me today. That was about a year before I discovered Lisp.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.