Related articles |
---|
--- Regexps, compilers etc., prog help --- sjain@a.chem.upenn.edu (1997-02-11) |
Re: --- Regexps, compilers etc., prog help --- the_artful_parser@null.net (Quinn Tyler Jackson) (1997-02-16) |
Re: --- Regexps, compilers etc., prog help --- pfox@lehman.com (Paul David Fox) (1997-02-16) |
Re: --- Regexps, compilers etc., prog help --- jmccarty@sun1307.spd.dsccc.com (1997-02-20) |
Re: --- Regexps, compilers etc., prog help --- nkramer@cs.cmu.edu (Nick Kramer) (1997-02-20) |
Re: --- Regexps, compilers etc., prog help --- ok@cs.rmit.edu.au (1997-02-22) |
Re: --- Regexps, compilers etc., prog help --- kanze@gabi-soft.fr (1997-03-14) |
Re: --- Regexps, compilers etc., prog help --- the_artful_parser@null.net (1997-03-16) |
Re: --- Regexps, compilers etc., prog help --- henry@zoo.toronto.edu (Henry Spencer) (1997-03-21) |
Re: --- Regexps, compilers etc., prog help --- henry@zoo.toronto.edu (Henry Spencer) (1997-03-21) |
From: | the_artful_parser@null.net (Quinn Tyler Jackson) |
Newsgroups: | comp.compilers |
Date: | 16 Mar 1997 23:36:19 -0500 |
Organization: | Parse City |
References: | 97-02-072 97-02-122 97-02-125 97-03-066 |
Keywords: | parse |
On 14 Mar 1997 00:34:37 -0500, in comp.compilers J. Kanze wrote:
>Concerning the moderator's comment: most regular expression's I see
>these days have been extended to allow saving part of the match. I
>think that this requires back-tracking (or does anyone know of a one
>pass solution that works in all cases).
n
I'm not particularly fond of the syntax for saving part of an RE
match. Cosnider an RE that matches any identifier I followed by a
colon, followed by that same identifier:
(I):\1 (that's \one, not \ell)
I implemented the same semantic in LPM as follows:
[^
<I>
['the_identifier'^
[':'$
['the_identifier'!
Internally, a stack of offsets is used -- when the [^ is encountered,
the current offset of the intermediate match is pushed to this stack.
When the corresponding ['foobar'^ is encountered, that offset is then
used in conjunction with the second intermediate offset -- the two
values are used to construct a temporary substring, which is pushed to
a string stack and associated with the set 'foobar'.
The ['foobar'! clause matches any member in the set 'foobar'. Since
that substring stack is given precedence over permanent members of the
set 'foobar' -- the above pattern matches "apple:apple" but not
"apple:orange"
This is done at the expense of creating temporary substrings for
internal use. Would those temporary substrings be considered
backgracking?
> Even with normal regular expressions, there are a number of tradeoffs
> between memory use, execution time and flexibility.
You can't have it all. ;-) In LPM, failure to match is more 'costly'
than success in a match. Any large stream can be efficiently scanned
for almost any complexity of pattern that is somewhere in the stream
-- but the cost of determining that a pattern ISN'T in the stream is
high in version 1.0 of the engine. Version 2.0 uses "polymorphism"
and initial tests have shown that this alone speeds up things
considerably.
> Enough to make me think that there still will be occasions for
> writing your own parser.
Which is what I did. <wink>
--
email: the_artful_parser@null.net
url: http://mypage.direct.ca/q/qjackson/
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.