Online Parsing

Hans-Peter Diettrich <>
24 Dec 2006 23:28:27 -0500

          From comp.compilers

Related articles
Online Parsing (Hans-Peter Diettrich) (2006-12-24)
| List of all articles for this month |

From: Hans-Peter Diettrich <>
Newsgroups: comp.compilers
Date: 24 Dec 2006 23:28:27 -0500
Organization: Compilers Central
Keywords: parse, question
Posted-Date: 24 Dec 2006 23:28:27 EST

Principally I'm searching for hints on the implementation of syntax
highlighting and hyperlink navigation. The according grammars are
(should be) not very complex, so that they possibly can be interpreted
at runtime.

I've already implemented a line-based parser, which can be invoked
whenever a line has to be displayed. The parser returns an token
identifier, and the location of the token in the given line text. The
token identifier is used to select the appearance of the token in the
visual representation, and also can be used to indicate the symbol
table, that is applicable for hyperlink navigation or, in an extended
version, for spell checking or a code completion feature.

Until here it's more a lexer than a parser. For the handling of
multi-line constructs, like block comments, the parser is fed with an
initial state at the begin of the line, and returns the state after the
parse of a token. In a first pass, over the whole text, the parser state
at the begin of every line is stored in an array, for later use when a
single line has to be parsed again.

This model works fine for immutable texts, but now I want to extend it
for use in an editor, where the text can change and might be in a
somewhat syntactically incorrect state (unmatched pairs of parentheses
etc.). Another intended option is the implementation of kind of
multi-level grammars, so that e.g. HTML/XML tags can be parsed, with the
new requirement(?) for an stack of states. Perhaps I need a separation
into an lexer and an parser, so that now two distinct states have to be
stored, for every possible starting point in the input text?

What should be considered, with regards to the complexity of the
grammars, and the possible separation into distinct lexer and parser stages?

What are somewhat obvious restrictions or problems with the above (line
based) implementation? Would it be easier to store the tokens, instead
of the lexer/parser states, so that the parser has to be invoked only
whenever the text has changed?

Thanks for all comments

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.