Related articles |
---|
"dynamic" parser uk7i@rz.uni-karlsruhe.de (jan) (1998-07-28) |
Re: "dynamic" parser qjackson@wave.home.com (Quinn Tyler Jackson) (1998-08-02) |
Re: "dynamic" parser nnylfv@ny.ubs.com (Olivier Lefevre) (1998-08-10) |
From: | "Quinn Tyler Jackson" <qjackson@wave.home.com> |
Newsgroups: | comp.compilers |
Date: | 2 Aug 1998 23:56:53 -0400 |
Organization: | Compilers Central |
References: | 98-07-217 |
Keywords: | parse, WWW |
>I'm Looking For Pointers How To Construct A Html-Parser, Which Can Be
>Configured By Selection Of Elements.
>
>To Clarify: I Want To Parse Stock Notations From The Web. The User Should
>Be Able To Select A Site And Then Designate Several Fields Like The Ticker
>Symbol, The Order Volume Etc.
I am currently putting the finishing touches on version 2.0 of Laleh's
Pattern Matcher, and will be making it available for "all use" on a
no-redistribution license. (IE. Use it where you want to, commercial,
academinc, or otherwise, just don't redistribute the source.) There
will be a simple HTML parser included as an example that may provide
you with a place to get started on that. Here's an example of the
HTML pattern:
<HTML(a)>
-- this macro would find any <a href=""> ... </a> pair and retrieve the
stuff that was between the tags, it could just as easily have been
<HTML(head)> or <HTML(title)>
LPM C++ v2.0's biggest addition is what I am calling polymorphic
patterns, an addition that was inspired by Dr. Francoise Balmas'
Ph. D. work into pattern matching. (You can find a link to her work
in the Computer Science section on my homepage.)
In their first incarnation, PPs in LPM allow less expensive patterns
to prescreen a stream before more expensive patterns are applied. In
the case of an HTML test parse that I just ran a few minutes ago,
where an HTML tag pair is placed at the very end of a 100kb string of
non-HTML filler, averaged over 100 tests, with no compiler
optimizations turned on, the polymorphic pattern faired 191 times
faster than the non-polymorphic one. Timings to find the <a ... >
... </a> tag pair were:
P-LPM -- 206 milliseconds
LPM -- 39525 milliseconds
Timings like this have convinced me that the polymorphic enhancements
were worth pursuing.
All that's left for me to do is add the "PRIVATE", "PROTECTED", and
"PUBLIC" inheritence modifiers, and to write the docs that explain how
to use LPM. It might be a week or two before version 2.0 is up for
ftp.
--
Quinn Tyler Jackson
email: qjackson@wave.home.com
url: http://www.qtj.net/~quinn/
ftp: qtj.net
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.