Re: Alternative Syntax for Regular Expressions?

ralph@inputplus.demon.co.uk (Ralph Corderoy)
16 Oct 2001 00:10:56 -0400

          From comp.compilers

Related articles
[4 earlier articles]
Re: Alternative Syntax for Regular Expressions? vannoord@let.rug.nl (2001-10-12)
Re: Alternative Syntax for Regular Expressions? dmitry@elros.cbb-automation.de (2001-10-12)
Re: Alternative Syntax for Regular Expressions? alexc@world.std.com (2001-10-13)
Re: Alternative Syntax for Regular Expressions? rboland@unb.ca (Ralph Boland) (2001-10-13)
Re: Alternative Syntax for Regular Expressions? spinoza1111@yahoo.com (2001-10-14)
Re: Alternative Syntax for Regular Expressions? eanders@cs.berkeley.edu (2001-10-16)
Re: Alternative Syntax for Regular Expressions? ralph@inputplus.demon.co.uk (2001-10-16)
Re: Alternative Syntax for Regular Expressions? spinoza1111@yahoo.com (2001-10-20)
Re: Alternative Syntax for Regular Expressions? spinoza1111@yahoo.com (2001-10-20)
Re: Alternative Syntax for Regular Expressions? spinoza1111@yahoo.com (2001-10-20)
| List of all articles for this month |

From: ralph@inputplus.demon.co.uk (Ralph Corderoy)
Newsgroups: comp.compilers
Date: 16 Oct 2001 00:10:56 -0400
Organization: InputPlus Ltd.
References: 01-10-029 01-10-072
Keywords: lex
Posted-Date: 16 Oct 2001 00:10:56 EDT

Hi Edward,


> In Hopcroft and Ullman's 1973 book FORMAL LANGUAGES AND THEIR
> RELATION TO AUTOMATA they were among the first to reveal the
> discovery that regular expressions corresponded to a particular type
> of language, "Chomsky Type 0" which they named in honor of MIT's
> Noam Chomsky who is both a pioneer in linguistics and a political
> gadfly.


Isn't Type 3 the regular grammar under Chomsky's classification?


> Backus-Naur grammars are more readable, by several orders of
> magnitude, than regular expressions.


Just change the regular expression syntax. Perl has done this. So's
Python. And lex gave names to parts of patterns.


> ^(\([0-9]{3}\)[ ]{1}){0,1}[0-9]{3}\-[0-9]{4}$ Yecchhhh


Just using some of these changes gives


        ^(\(\d\d\d\) )?\d\d\d-\d\d\d\d$


Why use a character class for the single space? Why suffix that class
with {1} which is redundant? Why use {0,1} instead of ?? Why escape
the dash?


You've made it more cluttered than necessary.


> phoneNumber := STARTOFINPUT phoneNumberBody ENDOFINPUT
> phoneNumberBody := localPhoneNumber
> phoneNumberBody := areaCode SPACE localPhoneNumber
> areaCode := [0-9]{3}
> localPhoneNumber := prefix DASH suffix
> prefix := [0-9]{3}
> suffix := [0-9]{4}


Some of those lines could be done exactly the same way in lex as I
mentioned earlier.


Cheers,


Ralph.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.