alpha version of re2c now available by anonymous ftp

peter@csg.uwaterloo.ca (Peter Bumbulis)
Sat, 16 Apr 1994 16:53:20 GMT

          From comp.compilers

Related articles
alpha version of re2c now available by anonymous ftp peter@csg.uwaterloo.ca (1994-04-16)
| List of all articles for this month |
Newsgroups: comp.compilers
From: peter@csg.uwaterloo.ca (Peter Bumbulis)
Keywords: DFA, tools, FTP
Organization: University of Waterloo
Date: Sat, 16 Apr 1994 16:53:20 GMT

An alpha version of re2c is now available by anonymous ftp:


        ftp://csg.uwaterloo.ca/pub/peter/re2c.0.5.tar.gz


re2c is a tool for generating C-based recognizers from regular
expressions. re2c-based scanners are efficient: for programming
languages, given similar specifications, an re2c-based scanner is
typically almost twice as fast as a flex-based scanner with little or no
increase in size (possibly a decrease on cisc architectures). Indeed,
re2c-based scanners are quite competitive with hand-crafted ones.


Unlike flex, re2c does not generate complete scanners: the user must
supply some interface code. While this code is not bulky (about 50-100
lines for a flex-like scanner; see the man page and examples in the
distribution) careful coding is required for efficiency (and correctness).
One advantage of this arrangement is that the generated code is not tied
to any particular input model. For example, re2c generated code can be
used to scan data from a null-byte terminated buffer as illustrated below.


Given the following source


        #define NULL ((char*) 0)
        char *scan(char *p){
        char *q;
        #define YYCTYPE char
        #define YYCURSOR p
        #define YYLIMIT p
        #define YYMARKER q
        #define YYFILL(n)
        /*!re2c
[0-9]+ {return YYCURSOR;}
[\000-\377] {return NULL;}
        */
        }


re2c will generate


        /* Generated by re2c on Sat Apr 16 11:40:58 1994 */
        #line 1 "simple.re"
        #define NULL ((char*) 0)
        char *scan(char *p){
        char *q;
        #define YYCTYPE char
        #define YYCURSOR p
        #define YYLIMIT p
        #define YYMARKER q
        #define YYFILL(n)
        {
YYCTYPE yych;
unsigned int yyaccept;
goto yy0;
        yy1: ++YYCURSOR;
        yy0:
if((YYLIMIT - YYCURSOR) < 2) YYFILL(2);
yych = *YYCURSOR;
if(yych <= '/') goto yy4;
if(yych >= ':') goto yy4;
        yy2: yych = *++YYCURSOR;
goto yy7;
        yy3:
        #line 10
{return YYCURSOR;}
        yy4: yych = *++YYCURSOR;
        yy5:
        #line 11
{return NULL;}
        yy6: ++YYCURSOR;
if(YYLIMIT == YYCURSOR) YYFILL(1);
yych = *YYCURSOR;
        yy7: if(yych <= '/') goto yy3;
if(yych <= '9') goto yy6;
goto yy3;
        }
        #line 12


        }


Note that most compilers will perform dead-code elimiation to remove all
YYCURSOR, YYLIMIT comparisions.


re2c was developed for a particular project (constructing a fast REXX
scanner of all things!) and so while it has some rough edges, it should be
quite usable. More information about re2c can be found in the (admittedly
skimpy) man page; the algorithms and heuristics used are described in an
upcoming LOPLAS article (included in the distribution). Probably the best
way to find out more about re2c is to try the supplied examples. re2c is
written in C++, and is currently being developed under Linux using gcc
2.5.8.


Peter
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.