Re: --- Regexps, compilers etc., prog help ---

the_artful_parser@null.net (Quinn Tyler Jackson)
16 Mar 1997 23:36:19 -0500

          From comp.compilers

Related articles
--- Regexps, compilers etc., prog help --- sjain@a.chem.upenn.edu (1997-02-11)
Re: --- Regexps, compilers etc., prog help --- the_artful_parser@null.net (Quinn Tyler Jackson) (1997-02-16)
Re: --- Regexps, compilers etc., prog help --- pfox@lehman.com (Paul David Fox) (1997-02-16)
Re: --- Regexps, compilers etc., prog help --- jmccarty@sun1307.spd.dsccc.com (1997-02-20)
Re: --- Regexps, compilers etc., prog help --- nkramer@cs.cmu.edu (Nick Kramer) (1997-02-20)
Re: --- Regexps, compilers etc., prog help --- ok@cs.rmit.edu.au (1997-02-22)
Re: --- Regexps, compilers etc., prog help --- kanze@gabi-soft.fr (1997-03-14)
Re: --- Regexps, compilers etc., prog help --- the_artful_parser@null.net (1997-03-16)
Re: --- Regexps, compilers etc., prog help --- henry@zoo.toronto.edu (Henry Spencer) (1997-03-21)
Re: --- Regexps, compilers etc., prog help --- henry@zoo.toronto.edu (Henry Spencer) (1997-03-21)
| List of all articles for this month |
From: the_artful_parser@null.net (Quinn Tyler Jackson)
Newsgroups: comp.compilers
Date: 16 Mar 1997 23:36:19 -0500
Organization: Parse City
References: 97-02-072 97-02-122 97-02-125 97-03-066
Keywords: parse

On 14 Mar 1997 00:34:37 -0500, in comp.compilers J. Kanze wrote:


>Concerning the moderator's comment: most regular expression's I see
>these days have been extended to allow saving part of the match. I
>think that this requires back-tracking (or does anyone know of a one
>pass solution that works in all cases).
n
I'm not particularly fond of the syntax for saving part of an RE
match. Cosnider an RE that matches any identifier I followed by a
colon, followed by that same identifier:


        (I):\1 (that's \one, not \ell)


I implemented the same semantic in LPM as follows:


        [^
              <I>
        ['the_identifier'^
        [':'$
        ['the_identifier'!


Internally, a stack of offsets is used -- when the [^ is encountered,
the current offset of the intermediate match is pushed to this stack.
When the corresponding ['foobar'^ is encountered, that offset is then
used in conjunction with the second intermediate offset -- the two
values are used to construct a temporary substring, which is pushed to
a string stack and associated with the set 'foobar'.


The ['foobar'! clause matches any member in the set 'foobar'. Since
that substring stack is given precedence over permanent members of the
set 'foobar' -- the above pattern matches "apple:apple" but not
"apple:orange"


This is done at the expense of creating temporary substrings for
internal use. Would those temporary substrings be considered
backgracking?


> Even with normal regular expressions, there are a number of tradeoffs
> between memory use, execution time and flexibility.


You can't have it all. ;-) In LPM, failure to match is more 'costly'
than success in a match. Any large stream can be efficiently scanned
for almost any complexity of pattern that is somewhere in the stream
-- but the cost of determining that a pattern ISN'T in the stream is
high in version 1.0 of the engine. Version 2.0 uses "polymorphism"
and initial tests have shown that this alone speeds up things
considerably.


> Enough to make me think that there still will be occasions for
> writing your own parser.


Which is what I did. <wink>
--
email: the_artful_parser@null.net
url: http://mypage.direct.ca/q/qjackson/
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.