APG Version 6.3 - an ABNF Parser Generator

Mon, 2 Jul 2012 10:53:23 -0400

          From comp.compilers

Related articles
APG Version 6.3 - an ABNF Parser Generator lowell@coasttocoastresearch.com (2012-07-02)
| List of all articles for this month |

DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=iecc.com; h=cc:from:subject:date:sender:message-id:reply-to:mime-version:content-type:content-transfer-encoding:vbr-info; s=c439.4ff2f333.k1207; i=johnl@user.iecc.com; bh=I/uDKaR759H6exXiV2QYSoDAadAnlBDsOt/3dm77wYQ=; b=HoUT+oyb7OannOL7DirfdL46BYyFJvApzoLMEAMvxUvDQ7e2xnzT6OjLg3rvhQjaMMuv9OzmbCoyMKk40RfiWLTbKKIXskBZnYwP/Pcyh23XBTBd9owGpjwgqOwqVlANKKs90SRugVMcD5+txmUxVWDOBsYNZ2+Scf6JL5yaNIc=
VBR-Info: md=iecc.com; mc=all; mv=dwl.spamhaus.org
From: lowell@coasttocoastresearch.com
Newsgroups: comp.compilers
Date: Mon, 2 Jul 2012 10:53:23 -0400
Organization: Compilers Central
Keywords: tools, BNF
Posted-Date: 03 Jul 2012 09:27:15 EDT

I'd like to announce the release of APG Version 6.3.
APG (an ABNF Parser Generator) generates recursive-descent parsers using the
ABNF grammar syntax. ABNF (RFC 4234) is the grammar syntax used for most IETF
Internet Standards and Protocols and is similar to EBNF in its
descriptive power.

The main new features in Version 6.3 are:
  - the addition of User-Defined Terminal (UDT) nodes
  - allowance for variable-width alphabet character codes,
      8-, 16-, 32- or 64-bits
  - runs on 32- or 64-bit operating systems
      (previous versions were 32-bit only)

APG views the syntax tree nodes as operators, each operator having its own
instructions on what to do and where to go next. The terminal nodes are
phrase recognizers and the non-terminal nodes are phrase manipulators.
For example, the ABNF literal string is a terminal node which recognizes
case-insensitive strings of printable ASCII characters. The repetition
operator is a non-terminal node phrase manipulator which concatenates
repeated occurances of a defined phrase. The ABNF grammar syntax defines
context-free languages. Generating parsers from the ABNF syntax was
was the single, original intent of APG (05-06-027).

However, in later versions APG has taken a much more cavalier approach to
these node operators, more or less inventing operators to fit the language
specification rather than simply implementing a CFG parser. For example,
it now implements the same non-terminal, syntactic predicate & and !
operators found in PEG. And the non-terminal rule (production) nodes have,
at least since Version 5.0, allowed great flexibility through their user-
written callback functions. This flexibility includes adminstrative work
(handling tables and data as computations done parallel to but not
interferring with the CFG parsing), rejecting the results of the parser
on semantic grounds and more drastic and language-altering tinkering
with the phrase recognition work.

I've since found this third use, hand writing some of the phrase recognition
work, to be very useful. However, with the rule name operators it is somewhat
difficult to implement (and explain.) I've therefore added a new
User-Defined Terminal (UDT) operator which simply calls a
user-written function to do the phrase recognition work. I see several
different levels of usage possible:

1. "safe" - speed up the parser without altering the language defined
      by the grammar. That is, replacing some rules with UDTs that
      recognize the same phrase but do it more quickly.
      I've seen dramatic (up to 10-fold) increases in parser speed with
      UDTs for a few simple and common phrases.
2. "unsafe" - using UDTs to handle a few non-CFG requirements in an otherwise
      nearly CFG language.
3. "totally dangerous" - simply using APG as a recursive-descent backbone
      in an otherwise completely handwritten parser.

APG Version 6.3 generates parsers in C and C++.
You can find the documentation and download here:

Within the terms of the GPL, you are free to use and modify it as you see

Lowell Thomas

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.