Re: terminological problem (EBNF & regular expressions)

"Paul Mann" <paul@parsetec.com>
14 Oct 2005 17:21:08 -0400

          From comp.compilers

Related articles
terminological problem Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-10-13)
Re: terminological problem (EBNF & regular expressions) paul@parsetec.com (Paul Mann) (2005-10-14)
Re: terminological problem (EBNF & regular expressions) Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-10-15)
Re: terminological problem (EBNF & regular expressions) paul@parsetec.com (Paul Mann) (2005-10-17)
Re: terminological problem (EBNF & regular expressions) paul@parsetec.com (Paul Mann) (2005-10-19)
Re: terminological problem (EBNF & regular expressions) Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-10-19)
Re: terminological problem (EBNF & regular expressions) paul@parsetec.com (Paul Mann) (2005-10-20)
Re: terminological problem (EBNF & regular expressions) Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-10-23)
[2 later articles]
| List of all articles for this month |

From: "Paul Mann" <paul@parsetec.com>
Newsgroups: comp.compilers
Date: 14 Oct 2005 17:21:08 -0400
Organization: Compilers Central
References: 05-10-082
Keywords: syntax, design
Posted-Date: 14 Oct 2005 17:21:08 EDT

"Detlef Meyer-Eltz" <Meyer-Eltz@t-online.de> wrote ...


> The TextTransformer uses for the tokens POSIX style (matching the
> longest input) regular expressions with '*', '+' and '?' for repeats
> and options and the same syntax is used for the non backtracking
> productions.
>
> How can I describe both kinds of syntax briefly and correctly?


With LRgen 6.0, I handled this problem by separating the lexer
grammar from the parser grammar. Both grammars allow the same
regular expression notation. There is no confusion because the
grammars are in separate files. The parser generator reads the
parser grammar first and modifies token numbers in the lexer
grammar, then reads the lexer grammar. The PG generates parser
tables and lexer tables as separate files. Then these are
included with source code and compiled.


I have posted most of the documentation for the parser grammars,
but not for the lexer grammars yet. If you want to take a look,
it's at: http://parsetec.com/lrgen


I allow lexer grammars to be specified with complete grammar
notation and, consequently, the amount of regular expressions
required is minimal. For example, here is part of a lexer
grammar:


<identifier> -> Letter
                              -> <identifier> Letter
                              -> <identifier> Digit
Letter -> 'a'..'z' + 'A'..'Z' + '_'


<integer> -> Digit...
Digit -> '0'..'9'


<spaces> -> Space...
Space -> \9 + \10 + \32


<comment1> -> '/' '*' EndInAst '/'
EndInAst -> '*'
                              -> Inside '*'
                              -> EndInAst NotS '*'
                              -> EndInAst NotS Inside '*'
Inside -> NotA
                              -> Inside NotA
                              -> <comment1> // nested comment line
                              -> Inside <comment1> // nested comment line
NotA -> \32 .. \126 + \9 + \10 - '*'
NotS -> \32 .. \126 + \9 + \10 - '/'


<comment2> -> '/' '/'
                              -> '/' '/' NotEOL...
NotEOL -> \32 .. \126 + \9




Paul Mann


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.