|--- Regexps, compilers etc., prog help --- email@example.com (1997-02-11)|
|Re: --- Regexps, compilers etc., prog help --- firstname.lastname@example.org (Quinn Tyler Jackson) (1997-02-16)|
|Re: --- Regexps, compilers etc., prog help --- email@example.com (Paul David Fox) (1997-02-16)|
|Re: --- Regexps, compilers etc., prog help --- firstname.lastname@example.org (1997-02-20)|
|Re: --- Regexps, compilers etc., prog help --- email@example.com (Nick Kramer) (1997-02-20)|
|Re: --- Regexps, compilers etc., prog help --- firstname.lastname@example.org (1997-02-22)|
|Re: --- Regexps, compilers etc., prog help --- email@example.com (1997-03-14)|
|[3 later articles]|
|From:||firstname.lastname@example.org (Hursh Jain)|
|Date:||11 Feb 1997 22:24:34 -0500|
|Organization:||University of Pennsylvania|
I am interested in writing a regular expression evaluator, and also a
all the functionality of C except no pointers. But pointers can be
simulated by using lots of arrays, where the value of one array points
to an index in another array. Eventually I'll convert everything to C.
I don't know too much theory and to be honest I don't want to get down
into too much theoretical detail ( involving books on compilers etc. )
I was wondering, can someone give me some conceptual, very high level
description in writing reg-exp evaluators and pointers to hacking a
compiler to a toy language ?
My current approach to writing a reg-exp evaluator is getting bogged
down with too many case statements. Essentially my current approach in
evaulating a small part of a reg-exp, namely stuff in brackets e.g,
[a-z, 4-5] is:
read a character
Begin: go to it's corresponding type entry in a table ( an array )
read next character.
compare with next entry in the table.
if character matches , repeat
else goto begin: ( ignore everthing read so far, because it's not well
sort of, following chains ( or states ) which start from somewhere
and end somewhere. When you reach the end of the chain, that part of
the regular expression is well formed and can be further evaluated.
would mean "[" -> (char) ? yes
"a" -> (dash) ? yes
"-" -> (char) ? yes
"f" -> ("," or "]") ? yes
end of chain. Start again.
and if I saw the expression "[aa-f]"
the first "a" would be ignored since it is followed by another a, not
a dash. ( and my state table tells me that a char has to be followed
by a dash)
My very limited knowledge indicated that reg-exps are the first step
to understanding compiler theory. Help !
[I'd sit down with the Dragon Book and read the chapter on regular
expresssions and finite state machines. In this case, an ounce of
theory is worth several hundred pounds of hacking, since it's such a
thoroughly studied field. -John]
Return to the
Search the comp.compilers archives again.