Related articles |
---|
Parsing a text stream jacktow@hotmail.com (2004-04-28) |
Re: Parsing a text stream mailbox@dmitry-kazakov.de (Dmitry A.Kazakov) (2004-04-29) |
Re: Parsing a text stream gopi@sankhya.com (2004-05-02) |
Re: Parsing a text stream pjj@cs.man.ac.uk (Pete Jinks) (2004-05-02) |
Re: Parsing a text stream Postmaster@paul.washington.dc.us (Paul Robinson) (2004-05-24) |
From: | Pete Jinks <pjj@cs.man.ac.uk> |
Newsgroups: | comp.compilers |
Date: | 2 May 2004 21:54:38 -0400 |
Organization: | Computer Science Dept, University of Manchester |
References: | 04-04-076 |
Keywords: | parse |
Posted-Date: | 02 May 2004 21:54:38 EDT |
jacktow@hotmail.com (Mansoor) wrote:
> I surprisingly don't seem to be able to find a clear explanation of
> "How to lexically analyse a chunk of text data".
I don't think this is quite what you are looking for, but if you don't
want to wait for a better tool to be invented:
http://www.cs.man.ac.uk/~pjj/complang/usinglex.html
describes an "Example of Processing Text using Lex alone",
including dealing with errors in various ways.
(Please let me know if you have any suggestions as to how to
make the web-page more useful.)
It also includes links to examples of handling simple mark-up:
"to approximately translate text between various formats
(e.g. from html to ASCII and from roff to html)."
http://www.cs.man.ac.uk/~pjj/complang/html2txt.l
http://www.cs.man.ac.uk/~pjj/complang/ms2html.l
I suppose that, to deal with your example:
<FONT color=black
<I>something</I>
you would need explicit error-handling regular expressions like:
"<"[^<]*
or maybe:
"<FONT"[^<]*">" /* OK */
"<FONT"[^<]*/"<" /* error, but do your best to cope */
--
Peter J. Jinks, Room 2.99, Department of Computer Science,
University of Manchester, Oxford Road, Manchester, M13 9PL, U.K.
(+44/0)161-275 6186 http://www.cs.man.ac.uk/~pjj
Return to the
comp.compilers page.
Search the
comp.compilers archives again.