Re: Parsing a text stream

Pete Jinks <pjj@cs.man.ac.uk>
2 May 2004 21:54:38 -0400

          From comp.compilers

Related articles
Parsing a text stream jacktow@hotmail.com (2004-04-28)
Re: Parsing a text stream mailbox@dmitry-kazakov.de (Dmitry A.Kazakov) (2004-04-29)
Re: Parsing a text stream gopi@sankhya.com (2004-05-02)
Re: Parsing a text stream pjj@cs.man.ac.uk (Pete Jinks) (2004-05-02)
Re: Parsing a text stream Postmaster@paul.washington.dc.us (Paul Robinson) (2004-05-24)
| List of all articles for this month |
From: Pete Jinks <pjj@cs.man.ac.uk>
Newsgroups: comp.compilers
Date: 2 May 2004 21:54:38 -0400
Organization: Computer Science Dept, University of Manchester
References: 04-04-076
Keywords: parse
Posted-Date: 02 May 2004 21:54:38 EDT

jacktow@hotmail.com (Mansoor) wrote:
> I surprisingly don't seem to be able to find a clear explanation of
> "How to lexically analyse a chunk of text data".


I don't think this is quite what you are looking for, but if you don't
want to wait for a better tool to be invented:
http://www.cs.man.ac.uk/~pjj/complang/usinglex.html
describes an "Example of Processing Text using Lex alone",
including dealing with errors in various ways.
(Please let me know if you have any suggestions as to how to
make the web-page more useful.)


It also includes links to examples of handling simple mark-up:
  "to approximately translate text between various formats
  (e.g. from html to ASCII and from roff to html)."
http://www.cs.man.ac.uk/~pjj/complang/html2txt.l
http://www.cs.man.ac.uk/~pjj/complang/ms2html.l


I suppose that, to deal with your example:
<FONT color=black
<I>something</I>
you would need explicit error-handling regular expressions like:
  "<"[^<]*
or maybe:
  "<FONT"[^<]*">" /* OK */
  "<FONT"[^<]*/"<" /* error, but do your best to cope */


--
Peter J. Jinks, Room 2.99, Department of Computer Science,
University of Manchester, Oxford Road, Manchester, M13 9PL, U.K.
(+44/0)161-275 6186 http://www.cs.man.ac.uk/~pjj


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.