Related articles |
---|
terminological problem Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-10-13) |
Re: terminological problem (EBNF & regular expressions) paul@parsetec.com (Paul Mann) (2005-10-14) |
Re: terminological problem (EBNF & regular expressions) Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-10-15) |
Re: terminological problem (EBNF & regular expressions) paul@parsetec.com (Paul Mann) (2005-10-17) |
Re: terminological problem (EBNF & regular expressions) paul@parsetec.com (Paul Mann) (2005-10-19) |
Re: terminological problem (EBNF & regular expressions) Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-10-19) |
Re: terminological problem (EBNF & regular expressions) paul@parsetec.com (Paul Mann) (2005-10-20) |
Re: terminological problem (EBNF & regular expressions) Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-10-23) |
[2 later articles] |
From: | "Paul Mann" <paul@parsetec.com> |
Newsgroups: | comp.compilers |
Date: | 14 Oct 2005 17:21:08 -0400 |
Organization: | Compilers Central |
References: | 05-10-082 |
Keywords: | syntax, design |
Posted-Date: | 14 Oct 2005 17:21:08 EDT |
"Detlef Meyer-Eltz" <Meyer-Eltz@t-online.de> wrote ...
> The TextTransformer uses for the tokens POSIX style (matching the
> longest input) regular expressions with '*', '+' and '?' for repeats
> and options and the same syntax is used for the non backtracking
> productions.
>
> How can I describe both kinds of syntax briefly and correctly?
With LRgen 6.0, I handled this problem by separating the lexer
grammar from the parser grammar. Both grammars allow the same
regular expression notation. There is no confusion because the
grammars are in separate files. The parser generator reads the
parser grammar first and modifies token numbers in the lexer
grammar, then reads the lexer grammar. The PG generates parser
tables and lexer tables as separate files. Then these are
included with source code and compiled.
I have posted most of the documentation for the parser grammars,
but not for the lexer grammars yet. If you want to take a look,
it's at: http://parsetec.com/lrgen
I allow lexer grammars to be specified with complete grammar
notation and, consequently, the amount of regular expressions
required is minimal. For example, here is part of a lexer
grammar:
<identifier> -> Letter
-> <identifier> Letter
-> <identifier> Digit
Letter -> 'a'..'z' + 'A'..'Z' + '_'
<integer> -> Digit...
Digit -> '0'..'9'
<spaces> -> Space...
Space -> \9 + \10 + \32
<comment1> -> '/' '*' EndInAst '/'
EndInAst -> '*'
-> Inside '*'
-> EndInAst NotS '*'
-> EndInAst NotS Inside '*'
Inside -> NotA
-> Inside NotA
-> <comment1> // nested comment line
-> Inside <comment1> // nested comment line
NotA -> \32 .. \126 + \9 + \10 - '*'
NotS -> \32 .. \126 + \9 + \10 - '/'
<comment2> -> '/' '/'
-> '/' '/' NotEOL...
NotEOL -> \32 .. \126 + \9
Paul Mann
Return to the
comp.compilers page.
Search the
comp.compilers archives again.