Related articles |
---|
Re: What stage should entities be resolved? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-03-12) |
Re: What stage should entities be resolved? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-03-14) |
Re: What stage should entities be resolved? costello@mitre.org (Roger L Costello) (2022-03-15) |
Re: What stage should entities be resolved? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-03-18) |
Re: What stage should entities be resolved? gah4@u.washington.edu (gah4) (2022-03-17) |
Re: What stage should entities be resolved? 480-992-1380@kylheku.com (Kaz Kylheku) (2022-03-18) |
Re: What stage should entities be resolved? gah4@u.washington.edu (gah4) (2022-03-18) |
Re: What stage should entities be resolved? martin@gkc.org.uk (Martin Ward) (2022-03-19) |
[1 later articles] |
From: | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
Newsgroups: | comp.compilers |
Date: | Mon, 14 Mar 2022 19:43:22 +0100 |
Organization: | Compilers Central |
References: | 22-03-019 22-03-025 22-03-028 |
Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="56396"; mail-complaints-to="abuse@iecc.com" |
Keywords: | parse, design |
Posted-Date: | 14 Mar 2022 14:50:21 EDT |
On 3/12/22 1:11 PM, Christopher F Clark wrote:
> Contrary to what might assume from my previous posting on this topic.
> I agree with Dodi.
>
> Sometimes, the right answer is another phase. To keep your lexer
> simple, it can be useful to have a separate phase that deals with
> "character" issues, whether that is transforming UTF-8 extensions into
> unique code points (or actual characters representing glyphs possibly
> accented, i.e. resolving the combining code points into canonical
> versions) or taking sequences like & or \n or whatever into single
> tokens (or characters). That *can* make the whole process simpler and
> faster.
I consider these "phases" as "filters". In my C parser I also had a
number of filter levels that handle the various aspects in detail of the
preprocessor macro substitution and conditional compilation. The parser
calls the top level filter to return the next C token, which in turn
calls lower level filters until all levels returned enough information
about the next token to parse.
A sloppy interpretation by Microsoft of the preprocessor as a
self-contained stage revealed that the newer C standards disallow a
stand-alone C preprocessor. Such a separate preprocessor could
synthesize tokens like "//" that never occured in a strict (embedded) C
standard implementation. Even if this was not stated explicitly in the
standard it turned out as a side effect of the lexer implementation.
DoDi
Return to the
comp.compilers page.
Search the
comp.compilers archives again.