A code generator for regular expressions with embedded actions (on potentially interrupted streams)

Hugo LECLERC <hugal.leclerc@gmail.com>
Tue, 23 Aug 2016 10:41:41 +0200

          From comp.compilers

Related articles
A code generator for regular expressions with embedded actions (on pot hugal.leclerc@gmail.com (Hugo LECLERC) (2016-08-23)
| List of all articles for this month |

From: Hugo LECLERC <hugal.leclerc@gmail.com>
Newsgroups: comp.compilers
Date: Tue, 23 Aug 2016 10:41:41 +0200
Organization: Compilers Central
Injection-Info: miucha.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="55833"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, available
Posted-Date: 23 Aug 2016 12:45:09 EDT

Hello,


For some of my projects (with specific parsing needs), I've worked on
a code generator. named HPIPE, to handle regular expressions with
embedded actions. It is notably designed for streams of data where
interruptions are possible (but can also be optimized for cases where
all the data is available on a given place).


In brief, compared to similar programs, the focus is set to
simplicity, automation and optimizations,


To be more specific, it may be compared to the excellent program
RAGEL, excepted mainly that
- actions (on a given position) are executed only if the expressions
match up to the end, and have the priority. It means notably that If
two expressions with actions fully match, only the actions from the
highest priority expression will be executed. Beside, if some rewind
is needed, HPIPE generates the needed code (for instance by
incrementing a ref count on the buffers if the stream is interrupted),
but HPIPE tries to avoid rewinds if there are overall faster
solutions,
- HPIPE can use training data to generate highly optimized code (for
instance for the conditions, for the evaluation of the average benefit
to make Boyer-Moore like optimizations, etc...),
- the semantic is meant to enable more optimizations. For instance,
some actions that would be simple to write manually are predefined to
enable HPIPE build the data structures that are expected for the best
optimizations (for zero-copy, to avoid rewinds if possible, and so
on),
- to simplify the descriptions, priorities between expressions are
handled explicitly. For instance, in 'A* B', B has the priority
whereas in 'A** B', priority belongs to A.


Albeit in its infancy, this project is used in production and has been
tested on lot of different cases.


Nevertheless, lot of features could be added...


Thus if there are people that could be interested, it would be great
to have their opinion e.g. on potentially needed changes, on places to
advertise the project, on features to add like target languages,
automation, optimizations... !


The source code, with more detailed documentation, is available at
https://github.com/hleclerc/hpipe.


Regards :)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.