alpha release "pp" pattern parsing language and machine

mjbishop@fastmail.com (matthew bishop)
Sat, 17 Aug 2019 21:45:17 +0000 (UTC)

          From comp.compilers

Related articles
alpha release "pp" pattern parsing language and machine mjbishop@fastmail.com (2019-08-17)
| List of all articles for this month |

From: mjbishop@fastmail.com (matthew bishop)
Newsgroups: comp.compilers
Date: Sat, 17 Aug 2019 21:45:17 +0000 (UTC)
Organization: Aioe.org NNTP Server
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="65818"; mail-complaints-to="abuse@iecc.com"
Keywords: parse, available
Posted-Date: 18 Aug 2019 14:33:46 EDT

Hello, I have written a small scripting language for parsing and
compiling mainly context-free languages.


I had an idea for parsing languages using a stack/tape (string array)
combination that stays in synchronisation through "push" and "pop" commands.
The idea seems to work. For example at
http://bumble.sourceforge.net/books/gh/eg/exp.tolisp.pss is a script that
translates simple arithmetic expressions into a lisp-like syntax. Also, at
http://bumble.sourceforge.net/books/gh/compilable.c.pss is a script which
translates parse-scripts into compilable c code (so the scripts can be
compiled to standalone executables).


The script language and virtual machine are implemented at
http://bumble.sourceforge.net/books/gh/object/ The system reads the
input stream one character at a time and constructs and compiles parse
tokens.


The machine and language were mainly inspired by "sed", both its
strengths and weaknesses.


Here is a small snippet showing the relationship between an
ebnf rule and some code in a parse-language script.


    # ebnf rule: commandset := commandset, command ;
    "commandset*command*" {
          clear; add "commandset*" ; push;
    }


The script snippet above implements the given ebnf rule. But the script
language can also compile the "attributes" of the grammar.


Complete scripts can be written on the command line like sed. Eg:
    pp -e 'read; [aeiou] { add "(vowel)"; } print; clear;' -i "abcde"
    (output is) "a(vowel)bcde(vowel)"


I use the -i switch here to provide input because the "pp" tool is
also a debugger at the moment (you can step through and view the
compiled program and machine state) and so cant accept input from
piped stdin. In the future I will separate "pp" into 2 tools. One
which is a debugger and script interpreter, and another which just
runs (interprets) the script.


The executable "pp" includes a script interpreter and viewer/debugger. The
script language can implement itself (!). For example:
http://bumble.sourceforge.net/books/gh/compile.pss contains a working
implementation (compiler) of the script language written as a parse-script
(it is boot-strapped by the "assembler" program /books/gh/asm.pp).


The idea uses a stack to maintain the parse tokens, the "tape" to maintain
and compile the "attributes", and a "workspace" as a text accumulator to
manipulate tokens and attributes.


This is a small idea, but I think it has potential. I would like to know what
other people think about it. The idea is so simple, and seems so effective,
that I am dubious that no-one else has implemented it before.


This is an open source software project.


The code is in an "alpha" stage. Useful scripts can be written, run,
viewed, and debugged (with the "pp" executable) or compiled with the
/books/gh/compilable.c.pss script. But the code needs to be reorganised
(the struct Program object, for example, should not be a member variable
of the struct Machine object). Also, there is a malloc segmentation fault
bug that I need to track down. Also, I need to think of a good name
for the system and debugger (or just leave it as "pp" for pattern parser).


I would appreciate any feedback or contributions to this project.
I would also be interested if any one knows of a similar system that
has already been implemented


regards
Matthew


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.