Re: Building a scanner(or pattern matcher) from a CFG grammar?

"Georgios Petasis" <petasis@iit.demokritos.gr>
19 Dec 2004 23:50:18 -0500

          From comp.compilers

Related articles
Building a scanner(or pattern matcher) from a CFG grammar? petasis@iit.demokritos.gr (Georgios Petasis) (2004-12-16)
Re: Building a scanner(or pattern matcher) from a CFG grammar? idbaxter@semdesigns.com (Ira Baxter) (2004-12-17)
Re: Building a scanner(or pattern matcher) from a CFG grammar? petasis@iit.demokritos.gr (Georgios Petasis) (2004-12-19)
Re: Building a scanner(or pattern matcher) from a CFG grammar? cdiggins@videotron.ca (christopher diggins) (2004-12-19)
Re: Building a scanner(or pattern matcher) from a CFG grammar? cdiggins@videotron.ca (christopher diggins) (2004-12-22)
| List of all articles for this month |

From: "Georgios Petasis" <petasis@iit.demokritos.gr>
Newsgroups: comp.compilers
Date: 19 Dec 2004 23:50:18 -0500
Organization: National Technical University of Athens, Greece
References: 04-12-064 04-12-083
Keywords: parse, lex
Posted-Date: 19 Dec 2004 23:50:18 EST

Dear Ira,


Thank you very much for your reply. I am downloading the
"XT" tools to see what they provide.


What I am trying to do is to use a CFG for pattern matching.
I have a corpus, and I want to identify some specific text portions in it.
I dump all the annotated text portions I want to learn how to locate, and
using a grammatical inference algorithm of my own, I create a
CFG grammar from the text portions. What I am missing is a
way to locate the text portions covered by the (generalised) CFG
grammar.


I cannot use a parser (like yacc) for this, as I don't want to match
the whole corpus, just parts of it. A way would have been with a
parser that does "partial matches", but my tests with PCPATR for
example have shown that the return parse trees are not so
trustworthy. So, I am looking for a solution for this problem...
(Also, I think that yacc cannot handle arbitrary CFGs...)


What is the "DMS Software Reengineering Toolkit" you are reffering at
the end of your post? Is it something I can try? Right now, the only
solutions I have found is the XT tools (which I haven't yet evaluated)
and the GrammarForge toolkit, which seems to have a parser capable of
working in scanning mode.


Best wishes,


George


"Ira Baxter" <idbaxter@semdesigns.com> wrote


>> I am trying to find software that can convert a CFG grammar into a
>> scanner, for locating patterns in text. I don't want a full parse, but
>> find patterns in free text (like for example lex-based scanner do),
>> but with the pattern language being CFG instead of regular. Any ideas?
>
> Well, if you want to use a CFG, then you can't avoid "parsing" in the
> conventional sense of the word. There's an easy way to do this: build
> a parser whose tokens are individual characters; then you can use
> "YACC". This may be rather inefficient, but perhaps you don't care.
>
> Eelco Visser did a lot of interesting work on "scannerless" parsers
> that are driven by arbitrary CFGs. Google for "scannerless parsers".
> Much of this work has been packaged up in "XT", which is an academic
> program transformation tool; you can find this at.
> http://catamaran.labs.cs.uu.nl/twiki/pt/bin/view/Tools/XT. You don't
> have to use all the machinery; you could use just the parser generator
> part. (I haven't used these tools personally, but I have great
> respect for the people that built them).
>
> Our experience is that when you have enough complexity to justify
> parsing (as opposed to regular expressions), that the "text" you are
> searching really does break into words, and so conventional lexing
> techniques apply. This helps make the process efficient. We do
> arbitrary pattern recognition on arbitrary structured text using
> arbitrary CFGs with our DMS Software Reengineering Toolkit.
>
> Can you describe why you don't want to have a lexer?


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.