Re: Integer sizes and DFAs

gah4 <gah4@u.washington.edu>
Sat, 26 Mar 2022 19:32:17 -0700 (PDT)

          From comp.compilers

Related articles
Integer sizes and DFAs christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-03-26)
Re: Integer sizes and DFAs 480-992-1380@kylheku.com (Kaz Kylheku) (2022-03-26)
RE: Integer sizes and DFAs christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-03-27)
Re: Integer sizes and DFAs gah4@u.washington.edu (gah4) (2022-03-26)
Re: Integer sizes and DFAs gah4@u.washington.edu (gah4) (2022-03-26)
RE: Integer sizes and DFAs christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-03-27)
| List of all articles for this month |
From: gah4 <gah4@u.washington.edu>
Newsgroups: comp.compilers
Date: Sat, 26 Mar 2022 19:32:17 -0700 (PDT)
Organization: Compilers Central
References: 22-03-073
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="97762"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, performance
Posted-Date: 26 Mar 2022 22:39:33 EDT
In-Reply-To: 22-03-073

On Saturday, March 26, 2022 at 4:42:55 PM UTC-7, Christopher F Clark wrote:


(snip)


> And, my point was 2**32 is large enough to be considered arbitrarily large with
> respect to most DFAs. Not quite the human genome, see extended analysis
> below. Here was my first analysis.


About 24 years ago I was working with a DNA sequencing group, and was
interested in speeding up this problem. The one I was most interested in
was special purpose hardware with many of the largest DRAM I could find,
arranged just to do this operation.


(Note that you need one more bit, to indicate when a match is found.)


There would be logic to read data off disk, and pass it directly to the DFA
array. There is, then, logic to store the offset into the disk file, and the
state at which the hit occured, to be read out later.


But we went onto other projects, and I never got to build one.


Since then, DRAM has gotten much larger, but so has the DNA database.


Yes the human genome is 3 gigabase, but the whole of GenBank is
now about 16 terabase, including WGS (whole genome sequences).


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.