Examples of using Diplodicus Parsing

"tj bandrowsky" <tbandrow@unitedsoftworks.com>
24 Jul 2002 02:25:34 -0400

          From comp.compilers

Related articles
Examples of using Diplodicus Parsing tbandrow@unitedsoftworks.com (tj bandrowsky) (2002-07-24)
| List of all articles for this month |
From: "tj bandrowsky" <tbandrow@unitedsoftworks.com>
Newsgroups: comp.compilers
Date: 24 Jul 2002 02:25:34 -0400
Organization: http://groups.google.com/
Keywords: parse
Posted-Date: 24 Jul 2002 02:25:34 EDT

Diplodicus is buggy and slow, but I've been using a test suite to
shake it out a bit. In the below example, I'm trying to parse an XML
file. Note that my grammar may be buggy but I'm working on it as I
write this.


In an earlier post, I said that diplodicus is a parser generator - and
I misspoke, it's actually a programmable parser - basically, you add
lexical rules and grammatical rules to a shift_reduce_parser_type
object, which manages these rules for you. I do this for convenience
sake, but it might be incorrect. It also seems a good place to
introduce some parallelism.


A few things to note:


a) I use / instead of \ for my limitted regular expressions. It makes
it more useful in C strings.


b) this is a bottom up parser.


c) grammatical rules are never ambiguous.


d) lexical rules can sometimes be - because of stupid stuff like
"9abc" also matches [/d][/a]+. This can be something of a headache.
I'm learning that it is best to keep the lexical rules simple and the
token sizes small.


e) the integers given are for a switch statement on the receiving end.
  A real world use of diplodicus would do something like:


parser.add_lexical_rule( ID_XML_START, "start", "<" );
parser.add_lexical_rule( ID_END_START, "endstart", "<//" );


and there would be a


switch (blah) {
case ID_XML_START:
break;
case ID_END_START:
break;
}


f) parser.parse is written to be continuous. I wrote this so I could
recv and then parse 3 bytes, recv and then parse 7 bytes, etc. It's
the same sort of problem with asynch i/o on NT, never know how many
bytes you actually get when doing things in parallel.


g) the search for matching right hand rules is fairly quick.. on the
other hand, it has to do it a lot, so diplodicus drags at the moment.
Things could get faster as I'm still doing a lot of stupid stuff, but,
diplodicus will never run at mach 2.5, although it will be easy as
heck to make compilers of all shapes and sizes with. After I finish
profile server I want to experiment with a language that has units.
you know, like degrees, etc.


h) $delete is a special rule type that clears rather than reduces the
stack up to the point where the rh match took place. In the example
below I use it for white space removal.


If this approach is completely crazy let me know.


void test_parser3( char *_filename )
{
// then, test the engine
shift_reduce_parser_type parser;


parser.add_lexical_rule( 1, "start", "<" );
parser.add_lexical_rule( 2, "endstart", "<//" );
parser.add_lexical_rule( 3, "end", ">" );
parser.add_lexical_rule( 4, "commentstart", "<!--" );
parser.add_lexical_rule( 5, "dtdstart", "<!DOCTYPE" );
parser.add_lexical_rule( 6, "docrefstart", "<?xml" );
parser.add_lexical_rule( 7, "docrefend", "?>" );
parser.add_lexical_rule( 8, "ws", "/w+" );
parser.add_lexical_rule( 9, "name", "[/a][/a/d_]+" );
parser.add_lexical_rule( 10, "number", "[/d]+" );
parser.add_lexical_rule( 11, "floatnumber", "[/d]+/.[/d]+" );
parser.add_lexical_rule( 12, "floatnumber", "/.[/d]+" );
parser.add_lexical_rule( 13, "quote", "\"" );
parser.add_lexical_rule( 14, "otherpunct",
"[-~!@#$%^&/*()_;|:+-/]/[/+{}//\\?,/.]" );
parser.add_lexical_rule( 15, "equal", "=" );
parser.add_lexical_rule( 16, "attrtagend", "//>" );
parser.add_lexical_rule( 17, "system", "SYSTEM" );


parser.add_grammatical_rule( 50, "$delete", "ws" );
parser.add_grammatical_rule( 51, "otherpunct", "otherpunct
otherpunct" );
parser.add_grammatical_rule( 52, "quotedstring", "quote start" );
parser.add_grammatical_rule( 53, "quotedstring", "quote name" );
parser.add_grammatical_rule( 54, "quotedstring", "quote number" );
parser.add_grammatical_rule( 55, "quotedstring", "quote floatnumber"
);
parser.add_grammatical_rule( 56, "quotedstring", "quote otherpunct"
);
parser.add_grammatical_rule( 57, "quotedstring", "quote end" );
parser.add_grammatical_rule( 58, "quotedstring", "quotedstring name"
);
parser.add_grammatical_rule( 59, "quotedstring", "quotedstring
number" );
parser.add_grammatical_rule( 60, "quotedstring", "quotedstring
floatnumber" );
parser.add_grammatical_rule( 61, "quotedstring", "quotedstring
otherpunct" );
parser.add_grammatical_rule( 62, "quotedstring", "quotedstring end"
);
parser.add_grammatical_rule( 63, "quotedstring", "quotedstring start"
);
parser.add_grammatical_rule( 64, "string", "quotedstring quote" );
parser.add_grammatical_rule( 65, "commenting", "commentstart name" );
parser.add_grammatical_rule( 66, "commenting", "commentstart number"
);
parser.add_grammatical_rule( 67, "commenting", "commentstart
floatnumber" );
parser.add_grammatical_rule( 68, "commenting", "commentstart
otherpunct" );
parser.add_grammatical_rule( 69, "commenting", "commenting name" );
parser.add_grammatical_rule( 70, "commenting", "commenting number" );
parser.add_grammatical_rule( 71, "commenting", "commenting
floatnumber" );
parser.add_grammatical_rule( 72, "commenting", "commenting
otherpunct" );
parser.add_grammatical_rule( 73, "comment", "commenting end" );
parser.add_grammatical_rule( 74, "$delete", "comment" );
parser.add_grammatical_rule( 75, "property", "name equal number" );
parser.add_grammatical_rule( 76, "property", "name equal floatnumber"
);
parser.add_grammatical_rule( 77, "property", "name equal string" );
parser.add_grammatical_rule( 78, "property", "property property" );
parser.add_grammatical_rule( 79, "xmlattribute", "start name property
attrtagend" );
parser.add_grammatical_rule( 80, "xmlattribute", "xmlattribute
xmlattribute" );
parser.add_grammatical_rule( 81, "xmltagstart", "start name property
end" );
parser.add_grammatical_rule( 82, "xmltagstart", "start name end" );
parser.add_grammatical_rule( 83, "xmltagend", "endstart name end" );


parser.add_grammatical_rule( 84, "xmldatatag", "xmltagstart name" );
parser.add_grammatical_rule( 85, "xmldatatag", "xmltagstart number"
);
parser.add_grammatical_rule( 86, "xmldatatag", "xmltagstart
floatnumber" );
parser.add_grammatical_rule( 87, "xmldatatag", "xmltagstart
otherpunct" );
parser.add_grammatical_rule( 88, "xmldatatag", "xmltagstart equal" );
parser.add_grammatical_rule( 89, "xmldatatag", "xmltagstart property"
);
parser.add_grammatical_rule( 90, "xmldatatag", "xmltagstart string"
);
parser.add_grammatical_rule( 91, "xmldatatag", "xmldatatag name" );
parser.add_grammatical_rule( 92, "xmldatatag", "xmldatatag number" );
parser.add_grammatical_rule( 93, "xmldatatag", "xmldatatag
floatnumber" );
parser.add_grammatical_rule( 94, "xmldatatag", "xmldatatag
otherpunct" );
parser.add_grammatical_rule( 95, "xmldatatag", "xmldatatag equal" );
parser.add_grammatical_rule( 96, "xmldatatag", "xmldatatag property"
);
parser.add_grammatical_rule( 97, "xmldatatag", "xmldatatag string" );


parser.add_grammatical_rule( 98, "xmltag", "xmldatatag xmltagend" );
parser.add_grammatical_rule( 99, "xmltag", "xmltagstart xmlattribute
xmltagend" );
parser.add_grammatical_rule( 100, "xmltag", "xmltagstart xmlattribute
xmltag xmltagend" );
parser.add_grammatical_rule( 101, "xmltag", "xmltagstart xmltag
xmlattribute xmltagend" );
parser.add_grammatical_rule( 102, "xmltag", "xmltagstart xmltag
xmltagend" );
parser.add_grammatical_rule( 103, "xmltag", "xmltag xmltag" );
parser.add_grammatical_rule( 104, "xmldescr", "docrefstart property
docrefend" );
parser.add_grammatical_rule( 105, "xmldtd", "dtdstart name system
string end" );
parser.add_grammatical_rule( 106, "xmldocument", "xmldescr xmldtd
xmltag" );


parser.start();


page_string_type<char> my_data;
int l = read_file( _filename, my_data );


parser.parse( my_data, l );


parser.finish();
}


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.