|Building a scanner(or pattern matcher) from a CFG grammar? firstname.lastname@example.org (Georgios Petasis) (2004-12-16)|
|Re: Building a scanner(or pattern matcher) from a CFG grammar? email@example.com (Ira Baxter) (2004-12-17)|
|Re: Building a scanner(or pattern matcher) from a CFG grammar? firstname.lastname@example.org (Georgios Petasis) (2004-12-19)|
|Re: Building a scanner(or pattern matcher) from a CFG grammar? email@example.com (christopher diggins) (2004-12-19)|
|Re: Building a scanner(or pattern matcher) from a CFG grammar? firstname.lastname@example.org (christopher diggins) (2004-12-22)|
|From:||"Ira Baxter" <email@example.com>|
|Date:||17 Dec 2004 00:32:09 -0500|
|Posted-Date:||17 Dec 2004 00:32:09 EST|
"Georgios Petasis" <firstname.lastname@example.org> wrote
> I am trying to find software that can convert a CFG grammar into a
> scanner, for locating patterns in text. I don't want a full parse, but
> find patterns in free text (like for example lex-based scanner do),
> but with the pattern language being CFG instead of regular. Any ideas?
Well, if you want to use a CFG, then you can't avoid "parsing" in the
conventional sense of the word. There's an easy way to do this: build
a parser whose tokens are individual characters; then you can use
"YACC". This may be rather inefficient, but perhaps you don't care.
Eelco Visser did a lot of interesting work on "scannerless" parsers
that are driven by arbitrary CFGs. Google for "scannerless parsers".
Much of this work has been packaged up in "XT", which is an academic
program transformation tool; you can find this at.
http://catamaran.labs.cs.uu.nl/twiki/pt/bin/view/Tools/XT. You don't
have to use all the machinery; you could use just the parser generator
part. (I haven't used these tools personally, but I have great
respect for the people that built them).
Our experience is that when you have enough complexity to justify
parsing (as opposed to regular expressions), that the "text" you are
searching really does break into words, and so conventional lexing
techniques apply. This helps make the process efficient. We do
arbitrary pattern recognition on arbitrary structured text using
arbitrary CFGs with our DMS Software Reengineering Toolkit.
Can you describe why you don't want to have a lexer?
Return to the
Search the comp.compilers archives again.