Re: Parallel lexers by chunking the input

anton@mips.complang.tuwien.ac.at (Anton Ertl)
Fri, 25 Mar 2022 08:04:59 GMT

          From comp.compilers

Related articles
Parallel lexers by chunking the input christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-03-24)
Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (2022-03-25)
Re: Parallel lexers by chunking the input gah4@u.washington.edu (gah4) (2022-03-25)
Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (2022-03-25)
Parallel lexers by chunking the input. christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-03-26)
Parallel lexers by chunking the input christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-03-26)
Re: Parallel lexers by chunking the input anton@mips.complang.tuwien.ac.at (2022-03-26)
| List of all articles for this month |
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.compilers
Date: Fri, 25 Mar 2022 08:04:59 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
References: 22-03-058 22-03-064
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="97958"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, parallel, comment
Posted-Date: 25 Mar 2022 07:45:38 EDT

Christopher F Clark <christopher.f.clark@compiler-resources.com> writes:
>But, the real issue is strings, comments, and markdown like features.
>They all suffer from the same issue, which is that they are generally
>unbounded strings that start and stop with very short sequences. And,
>by picking some arbitrary point in the future to start scanning, you
>cannot be sure whether you are inside or outside one of those
>features.


Yes. But you can represent this uncertainty in the state of the DFA.
Essentially you can take the original DFA, and make all states of that
(nondeterministic) start states of our continuation DFA. Process the
pieces of the input in parallel with the continuation automaton.


Now you know the end state of the original automaton the
first piece, and the end state of the continuation automaton of the
second piece; you can look up the end state of the original automaton
for the second piece from these informations. And so on for the
further pieces. This is a sequential component, but it is fast.


Now you can reprocess the second-to-last pieces in parallel using the
original automaton and performing the actions (e.g., converting
numbers into binary representation).


Not sure how to interface that to the parser, but if we have a similar
idea for parallelizing the parser, even the lex/flex interface might
be ok.


- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/
[Experiments would be useful here. Yes, you can do this, but do you
end up throwing away so much work that it's not worth it? -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.