Re: tips for writing regular expression interpreter/compiler?

"mefrill" <mefrill@yandex.ru>
2 Dec 2005 20:23:11 -0500

          From comp.compilers

Related articles
tips for writing regular expression interpreter/compiler? derekhaas@gmail.com (Derek Haas) (2005-11-26)
Re: tips for writing regular expression interpreter/compiler? jburgy@gmail.com (2005-11-30)
Re: tips for writing regular expression interpreter/compiler? rsc@swtch.com (Russ Cox) (2005-12-02)
Re: tips for writing regular expression interpreter/compiler? jeffrey.kenton@comcast.net (Jeff Kenton) (2005-12-02)
Re: tips for writing regular expression interpreter/compiler? mefrill@yandex.ru (mefrill) (2005-12-02)
Re: tips for writing regular expression interpreter/compiler? markwh04@yahoo.com (2005-12-23)
Re: tips for writing regular expression interpreter/compiler? markwh04@yahoo.com (2005-12-23)
Re: tips for writing regular expression interpreter/compiler? rsc@swtch.com (Russ Cox) (2005-12-23)
| List of all articles for this month |

From: "mefrill" <mefrill@yandex.ru>
Newsgroups: comp.compilers
Date: 2 Dec 2005 20:23:11 -0500
Organization: http://groups.google.com
References: 05-11-119
Keywords: lex
Posted-Date: 02 Dec 2005 20:23:11 EST

re --> re_list
re_list --> re_term re_list | re_term
re_term --> re_atom | re_alternation | re_kleene
re_alternation --> re_atom | re_atom
re_kleene --> re_atom '*'
re_atom --> 'one of simbols except '*' symbol and '|' symbol'


By basing on this grammar you may build the parser. Define separate
routine for each grammar's nonterminal. So, you will have re(),
re_list(), re_term(), re_alternation(), re_kleene() and re_atom()
functions. You should call the functions according to the grammar's
rules and only if th next symbol is suited. I do not like to talk about
frst and follow sets here, but you have to understand simple thing:


for example, in re_term function:


bool re_term()
{
      bool result = re_atom();


      if( result )
      {
            char symbol = look_next_symbol();
            if( symbol == '|' ) // we have alternation
            {
                  result = re_alternation();
            }
            else if( symbol == '*' ) // kleene construction
            {
                  result = re_kleene();
            }


      return result;
}


bool re_alternation()
{
      char symbol = get_next_symbol();
      if( symbol != '|' ) return false;


      bool result = re_atom();
      return result;
}


bool re_kleene()
{
      char symbol = get_next_symbol();
      if( symbol != '*' ) return false;


      return true;
}


bool re_atom()
{
      char symbol = get_next_symbol();
      if( symbol == '*' || symbol == '|' ) return false; // working
symbols cannot used as atoms


      return true;
}


Of course, you will need to make this grammar more complex, but the
approach is the same.


Vladimir.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.