|Table-driven Parser email@example.com (Cobus Kruger) (2005-09-30)|
|Re: Table-driven Parser firstname.lastname@example.org (Paul Mann) (2005-10-02)|
|Re: Table-driven Parser email@example.com (Dmitry A. Kazakov) (2005-10-02)|
|Re: Table-driven Parser Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-10-02)|
|Re: Table-driven Parser DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-10-02)|
|From:||"Cobus Kruger" <firstname.lastname@example.org>|
|Date:||30 Sep 2005 02:04:52 -0400|
|Keywords:||parse, question, comment|
I have been writing an RTF to PDF converter to correct issues we had
with software that exports our reports. Fortunately, I don't need
anything too complex - basic fonts, indenting and the like.
Unfortunately I have to admit that I am a little light on parsing
theory, so the basic algorithm has been a fairly ad hoc affair. It has
worked well for the most part, but I keep finding new tags that mess
up my rendition of the document if I ignore them.
I read in Microsoft's documentation that most RTF parsers are table
driven. To me that doesn't say much. I then downloaded their test app
which has a horde of lines making up tables, but doesn't seem to do
anything meaningful. Certainly, it is not building any kind of DOM or
giving me any of the info I would need to render it on-screen.
Does any of you have knowledge or experience of these kinds of parsers?
Any starting point will be very much appreciated.
[I've written RTF to HTML translators which tokenize the RTF and then
do a lot of ad-hoc stuff to figure out what it means. RTF never
struck me as having enough of a structure to merit a real parser, and
I definitely sympathize with the gotcha tags problem. -John]
Return to the
Search the comp.compilers archives again.