Re: State-of-the-art algorithms for lexical analysis?

Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Mon, 6 Jun 2022 08:59:47 +0200

          From comp.compilers

Related articles
State-of-the-art algorithms for lexical analysis? costello@mitre.org (Roger L Costello) (2022-06-05)
Re: State-of-the-art algorithms for lexical analysis? gah4@u.washington.edu (gah4) (2022-06-05)
Re: State-of-the-art algorithms for lexical analysis? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? costello@mitre.org (Roger L Costello) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? 480-992-1380@kylheku.com (Kaz Kylheku) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? gah4@u.washington.edu (gah4) (2022-06-06)
State-of-the-art algorithms for lexical analysis? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? gah4@u.washington.edu (gah4) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-06-07)
[5 later articles]
| List of all articles for this month |

From: Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Newsgroups: comp.compilers
Date: Mon, 6 Jun 2022 08:59:47 +0200
Organization: Compilers Central
References: 22-06-006 22-06-007
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="35996"; mail-complaints-to="abuse@iecc.com"
Keywords: lex
Posted-Date: 06 Jun 2022 11:04:44 EDT
In-Reply-To: 22-06-007

On 6/6/22 1:05 AM, gah4 wrote:


> It has a completely different PSL, Pattern Specification Language,
> much more powerful than the usual regular expression.


I wonder about the need for powerful patterns in programming languages.
Most items (operators, punctuators, keywords) are fixed literals with a
fixed ID for use by the parser and code generator. If source code is
written by humans then the remaining types (identifiers, literals,
comments) should not have a too complicated syntax. For machine
generated source code a lexer is not required, the generator can
immediately produce the tokens for the parser. And if humans should
understand the code produced by the generator then again the syntax has
to be as simple and easy understandable as possible to humans.




> Among others, PSL has the ability to define approximate matches,
> such as a word with one or more misspellings, that is insertions,
> deletions, or substitutions. Usual RE don't have that ability.


That's fine for keywords but does not help with user defined
identifiers. Still a nice to have feature :-)


> There are also PSL expressions for ranges of numbers.
> You can often do that with very complicated RE, considering
> all of the possibilities. PSL automatically processes those
> possibilities. (Some can expand to complicated code.)


If this feature is really helpful to the user?




> I suspect that in many cases the usual RE is not optimal for
> lexical analysis, other than being well known.
>
> But as noted, DFA are likely the best way to do them.


ACK


> Though that could change with changes in computer hardware.


Or with the style of writing. APL already tried to simplify typing, in
the near future a Chinese programming language with a glyph for each
token (except literals) would eliminate the need for a lexer. Then a
demand may arise for speech-to-text and reverse tools instead of a
lexer, for each natural language.


DoDi
[Regular expressions have the advantage that once you've paid the one-time cost
of making a DFA, the matching is extremely fast. Since the lexer is usually
one of the slowest parts of a compiler since it is the only part that has to
look at each character of the source program, this is a place where speed
matters. Anyone know how fast PSLs are? I've seen fuzzy matchers but they
haven't been very fast. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.