Related articles |
---|
=?UTF-8?Q?APG_=e2=80=93_ABNF_Parser_Generator=2c_Version_7=2e0?= ldt@sabnf.com (Lowell Thomas) (2021-02-21) |
From: | Lowell Thomas <ldt@sabnf.com> |
Newsgroups: | comp.compilers |
Date: | Sun, 21 Feb 2021 16:35:08 -0500 |
Organization: | Compilers Central |
Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="52791"; mail-complaints-to="abuse@iecc.com" |
Keywords: | parse, available |
Posted-Date: | 21 Feb 2021 17:05:32 EST |
APG – ABNF Parser Generator, Version 7.0 is now available. APG generates
recursive-descent parsers directly from ABNF grammars and is therefore
well-suited to applications for Internet specifications which are often
defined with ABNF syntax. Translations are done via callback functions
in real time from the parse tree nodes or, optionally, at a later stage
from the AST nodes if you choose to generate one.
The source code is here (https://github.com/ldthomas/apg-7.0) and the
documentation is here (https://sabnf.com/documentation-2/). It is
licensed with the permissive 2-Clause BSD license so you can use it
pretty much as you like.
I’ve been away from this for a while and it is hard to believe that it
has been 16 years since the first version
(2005-06-04)(https://compilers.iecc.com/comparch/article/05-06-027) and
9 years since the last C version
(2012-07-02)(https://compilers.iecc.com/comparch/article/12-07-003). But
I wanted to bring it up to date and add a few features I’ve been
planning for a while. Actually, quite a few but I’ll just mention two or
three of the main new additions here.
Optionally, APG can generate and use Partially-Predictive Parsing Tables
(PPPTs). That is, from an examination of the ABNF grammar the generator
can determine the range of alphabet characters and generate a table, one
entry for each character and parse tree node. A PPPT entry can have one
of four values:
• match – the node accepts the single character as a complete
phrase match
• empty – the node does not accept the character but does accept an
empty string match
• no match – the node rejects the character
• active – the node accepts the character but not as a full phrase
match, parsing must continue normally
The entries are not just for the terminal nodes. The generator has rules
for walking back up the parse tree and generating a table entry for
every node, terminal and non-terminal alike. I some cases even the root
node can accept or reject a character without ever having to descend the
parse tree at all. As a general rule, I’ve found that PPPTs will
increase parsing speeds by a factor of 2.
APG is developed as an API so you can build custom generators and
generate parsers on the fly in your own applications.
It includes a Pattern-Matching Engine which I believe is more powerful
than regex.
• replaces cryptic regex syntax with ABNF
• full recursion can match deeply nested pairs
• has two modes of back referencing. Introduces what I term
“parent-mode” back referencing. In particular it facilitates matching
not only the start and end tags of HTML or XML, with parent-mode back
referencing it is possible to match the tag names as well. I’m not
really a scholar on the topic so I won’t go so far as to say this has
never been done before, but I’m not aware of this type of back
referencing in any flavors of regex.
• allows handwritten code snippets for difficult-to-define phrases
• exposes the parser’s AST for complex translations of the matched
phrases
• exposes a tracing facility which make debugging new pattern
syntax easy
There’s lots more, but if you are interested you can read about it in
the documentation.
Regards,
Lowell Thomas
Return to the
comp.compilers page.
Search the
comp.compilers archives again.