Parsing a text stream

jacktow@hotmail.com (Mansoor)
28 Apr 2004 14:38:41 -0400

          From comp.compilers

Related articles
Parsing a text stream jacktow@hotmail.com (2004-04-28)
Re: Parsing a text stream mailbox@dmitry-kazakov.de (Dmitry A.Kazakov) (2004-04-29)
Re: Parsing a text stream gopi@sankhya.com (2004-05-02)
Re: Parsing a text stream pjj@cs.man.ac.uk (Pete Jinks) (2004-05-02)
Re: Parsing a text stream Postmaster@paul.washington.dc.us (Paul Robinson) (2004-05-24)
| List of all articles for this month |

From: jacktow@hotmail.com (Mansoor)
Newsgroups: comp.compilers
Date: 28 Apr 2004 14:38:41 -0400
Organization: http://groups.google.com
Keywords: lex, question
Posted-Date: 28 Apr 2004 14:38:41 EDT

Hi,


I surprisingly don't seem to be able to find a clear explanation of
"How to lexically analyse a chunk of text data".


What I'm looking for is a bit just a bit different form what I've
found so far. For example using a parser such as GOLD Parser with a
grammer , lets say HTML, we can parse a HTML file and tockenize it.


However !!, the problem is here. These parsers only succeed till the
end of data as long as every thing goes according to plan. If say you
have left out a HTML tag open, e.g "<FONT color=black
<I>something</I>", here FONT tag is not closed with a corresponding
">". As with all lexical analysers I have found so far, they can't
handle this sort of situations. If you ask why should they, then the
answer is in a text editor where somebody is not done with the code
yet, and syntax highlighting feature is supposed to ease the writer's
task, even unfinished tokens must be highlighted.


I have an idea already which is using Regular Expressions. The problem
with regex however is that we just can search and find a match. We
can't recognize parts and sections of a code - lets say in a C
program, - such as a function body or any other section made of
logical sub parts.


So in a summary, what I'm looking for is something with Lexical
Parsers capability and at the same time being able to handle errors
(whether by telling it, or make a new error handling mechanism) such
as what I said above.


If anyone in any kind is familiar with syntax highlighting (my actual
goal) or parsing stuff, I would be very pleased to hear any
suggestion, help, recommendation, etc.


Thanks for your time


Regards,


Mansour Behabadi
[I'm not aware of any good way to parse snippets of code other than
ad-hoc regex hacks. Hey, PhD students, get on it. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.