RE: Packrat parsers and memoization

"Quinn Tyler Jackson" <quinn_jackson2004@yahoo.ca>
Thu, 11 Jun 2009 07:06:59 -0700

          From comp.compilers

Related articles
RE: Packrat parsers and memoization quinn_jackson2004@yahoo.ca (Quinn Tyler Jackson) (2009-06-08)
Re: Packrat parsers and memoization kay@fiber-space.de (Kay Schluehr) (2009-06-09)
RE: Packrat parsers and memoization quinn_jackson2004@yahoo.ca (Quinn Tyler Jackson) (2009-06-11)
| List of all articles for this month |
From: "Quinn Tyler Jackson" <quinn_jackson2004@yahoo.ca>
Newsgroups: comp.compilers
Date: Thu, 11 Jun 2009 07:06:59 -0700
Organization: Compilers Central
References: 09-06-042
Keywords: parse
Posted-Date: 11 Jun 2009 10:27:07 EDT

Kay Schluehr said:


> I'm interested myself in memoization techniques for the CFG top down
> parsers I build. Those have quite good left factorization capabilities
> and use techniques analog to those of K.Thompson for parsing regular
> expressions but there are occasions where they run into backtracking
> mode as well.


The following Meta-S grammar was constructed to test the Meta-S engine's
memoization system.


grammar X
{
S ::= #grammar(memoize) // setting this to nomemoize would turn off
memoization


#memo_enter
(S1 "^") | (S1 "#") | (S1 "@") | (S1 "!") | (S1 "$")
#memo_leave ; // S


S1 ::= S2<+>; // S1
S2 ::= K; // S2
S3 ::= K; // S3
Z ::= ((A [G] B [G] C) | (A [G] B [G] D [G] F) | (A [G] B [G] D [G] E))
"."; // Z
Y ::= (( G | A | B | C | D | E | F)<+>) "!"; // Y
X ::= ((A [G] B [G] C) | (A [G] B [G] D [G] F) | (A [G] B [G] D [G] E))
"@"; // X
K ::= (Y | X | Z) "."; // K
A ::= aa; // A
aa ::= pp pp; // aa
pp ::= "a"; // pp
B ::= "b"; // B
C ::= "c"; // C
D ::= "d"; // D
E ::= "e"; // E
F ::= "f"; // F
G ::= H | "G"; // G
H ::= A; // H
};


Given the input:


aabde..aabdf..aabde..aabde..aabdf..aabde..aabde..aabdf..$


a worst-case backtracking situation is created, since $ as the terminating
symbol is the final check.


For the purposes of the following a "parse step" is an actual descent/match
rather than a memo-table fetch. Obviously a memo-fetch is not
computationally free, but one can assume, using hashing, that a memo-fetch
occurs in nearly constant time.


With memoization turned off, the parse was effected in 1745 actual parse
steps. With memoization turned on, the parse was effected in 17 actual parse
steps. The input was then doubled to:


aabde..aabdf..aabde..aabde..aabdf..aabde..aabde..aabdf..aabde..aabdf..aabde.
.aabde..aabdf..aabde..aabde..aabdf..$


With memoization turned off, the number of actual parse steps was 3485. With
memoization turned on, the number of actual parse steps was 17. (As it was
with the shorter string.)


The input was then doubled yet again to:


aabde..aabdf..aabde..aabde..aabdf..aabde..aabde..aabdf..aabde..aabdf..aabde.
.aabde..aabdf..aabde..aabde..aabdf..aabde..aabdf..aabde..aabde..aabdf..aabde
..aabde..aabdf..aabde..aabdf..aabde..aabde..aabdf..aabde..aabde..aabdf..$


Non-memoized parse steps: 6965
Memoized parse steps: 17


To give an idea of times, the last case takes 12 milliseconds
unmemoized and 1 millisecond memoized -- this due to the fact that
memoization is not computationally free, even though it is causing
descent/match logic to be bypassed.


Wait a second -- the number of parse steps in the memoized version is
constant -- how is that possible? ;-)


Yes, indeed -- the non-memoized version of the parse displays the
expected complexity behavior, but the memoized version displays
constant complexity with this grammar and these inputs.


The enhancements that allow for this behavior are collectively code
named Meta-Janus and the implementation details/algorithms are a
company trade secret, but the point is that memoization indeed can be
a very useful parsing optimization.


--
Quinn Tyler Jackson
Chief Scientist
Thothic Technology Partners, LLC



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.