Related articles |
---|
Table-driven Parser spamwontwork@kplanet.co.za (Cobus Kruger) (2005-09-30) |
Re: Table-driven Parser paul@parsetec.com (Paul Mann) (2005-10-02) |
Re: Table-driven Parser mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2005-10-02) |
Re: Table-driven Parser Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-10-02) |
Re: Table-driven Parser DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-10-02) |
From: | Hans-Peter Diettrich <DrDiettrich@compuserve.de> |
Newsgroups: | comp.compilers |
Date: | 2 Oct 2005 02:54:50 -0400 |
Organization: | Compilers Central |
References: | 05-09-142 |
Keywords: | parse |
Posted-Date: | 02 Oct 2005 02:54:50 EDT |
Cobus Kruger wrote:
> I read in Microsoft's documentation that most RTF parsers are table
> driven. To me that doesn't say much. I then downloaded their test app
> which has a horde of lines making up tables, but doesn't seem to do
> anything meaningful. Certainly, it is not building any kind of DOM or
> giving me any of the info I would need to render it on-screen.
My experience with MS and RTF is as follows:
MS has published a description for an safe RTF parser, but they
obviously never used such an approach themselves. Every new Word
version had its own syntax errors in the RTF output files, so that
subsequently a new help compiler had to be provided, capable of
handling those version specific errors. I didn't track these problems
further, after the Windows help system changed from RTF to HTML
sources.
[I've written RTF to HTML translators which tokenize the RTF and then
do a lot of ad-hoc stuff to figure out what it means. RTF never
struck me as having enough of a structure to merit a real parser, and
I definitely sympathize with the gotcha tags problem. -John]
As far as RTF sources result from WinWord documents, I think that
knowledge of the according WinWord document structure is helpful, in
order to figure out the WinWord construct, to which the RTF tags may
apply as attributes. Again it may be helpful to determine the WinWord
version for an RTF document, in order to apply the appropriate
heuristics. When the RTF is converted into an Word equivalent
representation, it may be possible to determine the really meant
representation for the text, apart from possible errors in the RTF
output.
IOW: write an RTF to DOC decompiler... ;-)
Or have a look at the RTF import procedures in some OpenOffice sources.
DoDi
Return to the
comp.compilers page.
Search the
comp.compilers archives again.