Re: State-of-the-art algorithms for lexical analysis?

gah4 <gah4@u.washington.edu>
Mon, 6 Jun 2022 10:03:55 -0700 (PDT)

          From comp.compilers

Related articles
State-of-the-art algorithms for lexical analysis? costello@mitre.org (Roger L Costello) (2022-06-05)
Re: State-of-the-art algorithms for lexical analysis? gah4@u.washington.edu (gah4) (2022-06-05)
Re: State-of-the-art algorithms for lexical analysis? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? costello@mitre.org (Roger L Costello) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? 480-992-1380@kylheku.com (Kaz Kylheku) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? gah4@u.washington.edu (gah4) (2022-06-06)
State-of-the-art algorithms for lexical analysis? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? gah4@u.washington.edu (gah4) (2022-06-06)
Re: State-of-the-art algorithms for lexical analysis? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-06-07)
Re: State-of-the-art algorithms for lexical analysis? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-06-07)
Re: State-of-the-art algorithms for lexical analysis? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-06-08)
| List of all articles for this month |

From: gah4 <gah4@u.washington.edu>
Newsgroups: comp.compilers
Date: Mon, 6 Jun 2022 10:03:55 -0700 (PDT)
Organization: Compilers Central
References: <Adh5kg76Z0xZslIuRRyzgUhteE2M6A==> 22-06-009
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="77191"; mail-complaints-to="abuse@iecc.com"
Keywords: lex
Posted-Date: 06 Jun 2022 15:57:06 EDT
In-Reply-To: 22-06-009

On Monday, June 6, 2022 at 8:06:28 AM UTC-7, Roger L Costello wrote:


(snip)


> I will look into PSL. There are algorithms for converting regexes to DFA
> and then using the DFA to tokenize the input. Are there algorithms for
> converting PSL to (what?) and then using the (what?) to tokenize the input?


The approximate searches are done using dynamic programming.
The penalty is 1 for insertion, deletion, or substitution and the score
is in 3 bits, so up to six spelling errors.


The whole query is then compiled into code for a systolic array,
which then runs as fast as the data comes off disk.


FDF2 is a 9U VME board that runs in a VME based Sun system.


FDF3 connects directly to a SCSI disk, and also to a Sun workstation.
In searching, it transfers directly from the disk. To load data into
the disk, the disk is accessed indirectly through the FDF3.
It is a desktop box, about the size of a large external SCSI disk.


Some of it is described here:


https://aclanthology.org/X93-1011.pdf


along with its use for searching Japanese text, and:


https://trec.nist.gov/pubs/trec3/papers/paper.ps.gz


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.