Re: C scanners, was Hand written or tool generated lexical analyzers for FORTRAN

nmm1@cus.cam.ac.uk (Nick Maclaren)
2 Oct 2005 02:48:12 -0400

          From comp.compilers

Related articles
Hand written or tool generated lexical analyzers for FORTRAN pankaj.jangid@gmail.com (Pankaj) (2005-09-14)
Re: Hand written or tool generated lexical analyzers for FORTRAN fjscipio@rochester.rr.com (Fred J. Scipione) (2005-09-17)
Re: Hand written or tool generated lexical analyzers for FORTRAN pankaj.jangid@gmail.com (Pankaj) (2005-09-27)
Re: C scanners, was Hand written or tool generated lexical analyzers f rsc@swtch.com (Russ Cox) (2005-09-30)
Re: C scanners, was Hand written or tool generated lexical analyzers f nmm1@cus.cam.ac.uk (2005-10-02)
Re: C scanners, was Hand written or tool generated lexical analyzers f vesa.karvonen@cs.helsinki.fi (Vesa Karvonen) (2005-10-13)
| List of all articles for this month |

From: nmm1@cus.cam.ac.uk (Nick Maclaren)
Newsgroups: comp.compilers
Date: 2 Oct 2005 02:48:12 -0400
Organization: University of Cambridge, England
References: 05-09-054 05-09-069 05-09-127 05-09-137
Keywords: lex, C
Posted-Date: 02 Oct 2005 02:48:12 EDT

  Russ Cox <rsc@swtch.com> wrote:
>> How about an idea where lexer is divided into two layers. First is a
>> scanner which takes character input and returns semi-classified tokens
>> but the second stage, the lexer, will completely resolve the remaining
>> tokens. i.e. second stage will classify keywords and identifiers.
>
>This is exactly what C compilers with built-in preprocessors are
>already forced to do. Since the preprocessor treats all words the
>same, regardless of whether they are reserved words, type names, or
>identifiers, it does what you call first stage lexing. Once the
>preprocessor is finished, the compiler proper handles your second
>stage lexing (with some help from the parser).


To some extent, yes. There is the highly confusing aspect in which
translation phase 4 needs to invoke phase 7, in order to handle
expressions in #if directives, which introduces several ambiguities
into the language. I now forget the details, as it is not my main
area of expertise, but people on the BSI C panel found several.


But, if you exclude that, and the more contorted use of the # and ##
preprocessor operations, I believe that most C compilers do work the
way you say. There are some truly horrible ambiguities with those
operators, but thankfully nobody seems to use the constructions that
trigger them.


        #include <stddef.h>
        #define A(x) # x
        A(offsetof)


is one of the simpler and more unambiguously implementation dependent.




Regards,
Nick Maclaren.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.