Re: Parser generator for package based protocols

Hans-Peter Diettrich <>
9 May 2006 00:52:59 -0400

          From comp.compilers

Related articles
Parser generator for package based protocols (2006-04-28)
Re: Parser generator for package based protocols (Chris F Clark) (2006-04-30)
Re: Parser generator for package based protocols (Hans-Peter Diettrich) (2006-05-09)
Re: Parser generator for package based protocols (Detlef Meyer-Eltz) (2006-05-12)
| List of all articles for this month |

From: Hans-Peter Diettrich <>
Newsgroups: comp.compilers
Date: 9 May 2006 00:52:59 -0400
Organization: Compilers Central
References: 06-04-169
Keywords: parse
Posted-Date: 09 May 2006 00:52:59 EDT wrote:

> I work with a lot of different package based communication protocols
> and would like to use some tool to generate parsers for these. The
> protocols are normally in the form of header+data where the header
> describes the length and type of the data and the data can contain new
> header+data structs.

I.e. you have to deal with multiple languages, one for every protocol
level? And possibly distinct languages or scanners for the header and
the data parts?

> Are there really no tools for this, or are I seraching with the wrong
> terminology?

Your input IMO deserves an scannerless parser, or a multi-level grammar,
or at least an parser with multiple dedicated scanners.

> What I want is a way to define the protocol in a grammar in a way that
> allows the tokens to be defined by the length information in the data
> before the token and generate a parser that will give me a parse tree
> of the data package.

I'd write an "dispatcher", that splits the stream into distinct parts,
e.g. by evaluating a length information. Then every part is sent to the
appropriate handler (parser...) for further processing, and the
dispatcher constructs the final parse tree from the subtrees, as
returned by the dedicated handlers.

> [Parser generators like yacc handle streams of tokens, not streams of
> bytes, and need something else to create the token streams from the
> input. To tokenize typical human-readable languages we use flex which
> does regular expression matching, but it's straightforward to write
> your own lexer, which I suspect is what people do here. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.