--- Regexps, compilers etc., prog help ---

sjain@a.chem.upenn.edu (Hursh Jain)
11 Feb 1997 22:24:34 -0500

          From comp.compilers

Related articles
--- Regexps, compilers etc., prog help --- sjain@a.chem.upenn.edu (1997-02-11)
Re: --- Regexps, compilers etc., prog help --- the_artful_parser@null.net (Quinn Tyler Jackson) (1997-02-16)
Re: --- Regexps, compilers etc., prog help --- pfox@lehman.com (Paul David Fox) (1997-02-16)
Re: --- Regexps, compilers etc., prog help --- jmccarty@sun1307.spd.dsccc.com (1997-02-20)
Re: --- Regexps, compilers etc., prog help --- nkramer@cs.cmu.edu (Nick Kramer) (1997-02-20)
Re: --- Regexps, compilers etc., prog help --- ok@cs.rmit.edu.au (1997-02-22)
Re: --- Regexps, compilers etc., prog help --- kanze@gabi-soft.fr (1997-03-14)
[3 later articles]
| List of all articles for this month |

From: sjain@a.chem.upenn.edu (Hursh Jain)
Newsgroups: comp.compilers
Date: 11 Feb 1997 22:24:34 -0500
Organization: University of Pennsylvania
Keywords: lex, comment

Hi all:

I am interested in writing a regular expression evaluator, and also a
toy compiler..

I am doing all my programming in Javascript ( server side ). It has
all the functionality of C except no pointers. But pointers can be
simulated by using lots of arrays, where the value of one array points
to an index in another array. Eventually I'll convert everything to C.

I don't know too much theory and to be honest I don't want to get down
into too much theoretical detail ( involving books on compilers etc. )

I was wondering, can someone give me some conceptual, very high level
description in writing reg-exp evaluators and pointers to hacking a
compiler to a toy language ?

My current approach to writing a reg-exp evaluator is getting bogged
down with too many case statements. Essentially my current approach in
evaulating a small part of a reg-exp, namely stuff in brackets e.g,
[a-z, 4-5] is:

read a character
Begin: go to it's corresponding type entry in a table ( an array )

read next character.
compare with next entry in the table.
if character matches , repeat
  else goto begin: ( ignore everthing read so far, because it's not well
formed )

sort of, following chains ( or states ) which start from somewhere
and end somewhere. When you reach the end of the chain, that part of
the regular expression is well formed and can be further evaluated.


seeing "[a-f,P-Q]"

would mean "[" -> (char) ? yes
"a" -> (dash) ? yes
"-" -> (char) ? yes
"f" -> ("," or "]") ? yes
end of chain. Start again.

and if I saw the expression "[aa-f]"

the first "a" would be ignored since it is followed by another a, not
a dash. ( and my state table tells me that a char has to be followed
by a dash)

My very limited knowledge indicated that reg-exps are the first step
to understanding compiler theory. Help !

[I'd sit down with the Dragon Book and read the chapter on regular
expresssions and finite state machines. In this case, an ounce of
theory is worth several hundred pounds of hacking, since it's such a
thoroughly studied field. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.