Related articles |
---|
Boolean grammar (javacc) rwatkinsNOSPAM@NOSPAMfoo-bar.org (Robert Watkins) (2003-11-11) |
From: | Robert Watkins <rwatkinsNOSPAM@NOSPAMfoo-bar.org> |
Newsgroups: | comp.compilers,comp.lang.java.programmer |
Date: | 11 Nov 2003 23:06:03 -0500 |
Organization: | foo-bar.org |
Keywords: | parse, Java, question |
Posted-Date: | 11 Nov 2003 23:06:03 EST |
I posted a similar message to comp.compilers.tools.javacc a few days ago,
but the traffic in that group is almost nill, so I could wait some time
for a response. Please forgive me if you think this is the wrong group.
Anyway, I wrote my first grammar over the weekend, for Boolean queries.
Never having done this before, and not having any compiler references
(other than the javacc documentation), I have no idea if what I have done
is flawed or terribly inefficient. As well, although I am fortunate
enough to work from home, one downside is that I'm pretty much in a
vacuum, and can't turn to someone at the next desk to ask this sort of
thing.
So, if anyone's got a minute to look the following over, I welcome
comments (those that are meant to be helpful, anyway).
I've only included the actions, as I believe the TOKEN stuff is okay.
Mostly I'm unsure if I've implemented the NEAR expression correctly and
don't know if the AND expression is constructed in an
efficient/elegant/sensible manner.
The NEAR operator can only take two operands (i.e. apples NEAR oranges)
and can be in any of 3 formats (case not important): NEAR, NEARn or
NEAR/n (where n is a positive integer, the default being 6). That part is
not difficult; what I'm not sure of is whether using
( LOOKAHEAD(2) nearExpr() | term() )
is the best approach. Is there a way to avoid the lookahead? It seems odd
to me to include nearExpr() in unaryExr(), but that's the only way I
could see to make it work.
The andExpr() is the one that puzzles me the most. It looks to me to be
terribly convoluted, but it does the job. If two words are adjacent in
the query w/out any operator, this should, and does, default to AND. As
well, I've got to allow for "X not Y", which will be converted to "X and
not Y", as NOT is unary. I just wonder if andExpr() is doing far more
work than it needs to.
Thanks in advance to anyone what takes the time to offer advice to
someone new to grammar writing.
-- Robert
----- extract from BooleanQueryParser.jjt -----
SimpleNode parse() #ROOT : {}
{
expr() <EOF>
{ return jjtThis; }
}
void expr() : {}
{
orExpr()
}
void orExpr() : {}
{
(
andExpr() ( <OR> andExpr() )*
{ jjtThis.setName("OR"); }
) #OR(>1)
}
void andExpr() : {}
{
(
unaryExpr() ( notExpr() | term() )* ( <AND> unaryExpr() ( notExpr() |
term() )* )*
{ jjtThis.setName("AND"); }
) #AND(>1)
}
void notExpr() : {}
{
(
<NOT> unaryExpr()
{ jjtThis.setName("NOT"); }
) #NOT
}
void nearExpr() :
{ String nearOp = null; }
{
(
term() nearOp = nearOp() term()
{ jjtThis.setName(nearOp); }
) #NEAR(>1)
}
void unaryExpr() : {}
{
"(" expr() ")" | qref() | notExpr()
|
( LOOKAHEAD(2) nearExpr() | term() )
}
String nearOp() :
{ Token t; }
{
( t = <NEAR> )
{
if (t.image.equals("NEAR") || t.image.equals("near")) {
t.image = "NEAR/6";
}
else {
int nearNumPos = (t.image.indexOf("/") == 4) ? 5 : 4;
try {
StringBuffer tokenBuf = new StringBuffer();
String nearNum = t.image.substring(nearNumPos);
int nearInt = Integer.parseInt(nearNum);
if (nearInt < 1) { nearInt = 6; }
tokenBuf.append("NEAR/").append(nearNum);
t.image = tokenBuf.toString();
}
catch (NumberFormatException e) {
throw new ParseException(token.image +
" is not a proper NEAR modifier");
}
}
return t.image;
}
}
// query references
void qref() #QREF :
{ Token t; }
{
( t = <QREF> )
{ jjtThis.setName(t.image); }
}
// just plain words or "quoted phrases"
void term() #TERM :
{ Token t; }
{
( t=<TERM> )
{ jjtThis.setName(t.image); }
}
Return to the
comp.compilers page.
Search the
comp.compilers archives again.