Related articles |
---|
Hashtable alternatives gwynfa@paradise.net.nz (Gwynfa) (2000-07-27) |
perfect hashing preston@tera.com (Preston Briggs) (2000-08-04) |
Re: perfect hashing ceco@jupiter.com (Tzvetan Mikov) (2000-08-05) |
Re: perfect hashing jsgray@acm.org (Jan Gray) (2000-08-09) |
Re: perfect hashing jmochel@foliage.com (2000-08-10) |
Re: perfect hashing fjh@cs.mu.OZ.AU (2000-08-10) |
Re: perfect hashing parz@home.com (Parzival) (2000-08-10) |
Re: perfect hashing rick@home.com (2000-08-13) |
Re: perfect hashing lars@bearnip.com (Lars Duening) (2000-08-13) |
Re: perfect hashing pmk@cray.com (2000-08-13) |
Re: perfect hashing tej@melbpc.org.au (Tim Josling) (2000-08-13) |
Re: perfect hashing bob_jenkins@burtleburtle.net (2000-08-13) |
Re: perfect hashing intmktg@Gloria.CAM.ORG (Marc Tardif) (2000-08-13) |
From: | rick@home.com (Rick Ohnemus) |
Newsgroups: | comp.compilers |
Date: | 13 Aug 2000 18:52:29 -0400 |
Organization: | none |
References: | 00-07-064 00-08-026 00-08-031 00-08-059 |
Keywords: | symbols |
>"Tzvetan Mikov" <ceco@jupiter.com> writes:
>Another approach for detecting keywords which can be even more
>efficient is to just write a big multi-level switch statement
>where you test one character at a time. For example, if you have
>three keywords "the", "this", and "foo", you can use the following
>code:
>
> << example source deleted and other comments deleted >>
>
>Of course, if there was a tool to generate code in that style, given a
>list of keywords, then that would solve the maintainability issue.
>I imagine it would not be difficult to write such a tool.
I have enclosed a piece of a perl script that I use to generate a
series of C/C++ switch statements to recognize a fixed set of
keywords. The real script reads a template file and inserts the code
in the appropriate place. That is very specific to my application so
I just cut out the relevant parts.
The included script just generates the switch statements. You need to add any
preprocessing (converting to lower/upper case, etc.) and an error return at
the end.
For example:
enum { NOT_KW, KW1, KW2, KW3 } kw_code;
#define MAX_KW_LEN 8
kw_code
match_keyword(const char *str)
{
size_t i;
const char * kw;
char us[MAX_KW_LEN + 1]; /* uppercased copy of str */
/* make sure str is uppercase */
for (i = 0; *str != '\0' && i < MAX_KW_LEN; ++i, ++str) {
us[i] = islower(*str) ? toupper(*str) : *str;
}
if (i == 0 || *str != '\0') {
return NOT_KW;
}
us[i] = '\0';
kw = us; /* point to the uppercased string */
<< generated switch code goes here >>
return NOT_KW;
}
Here is the script:
#!/usr/local/bin/perl
#
# Generate nested C switch statements to do quick keyword recognition.
# The generated code uses kw as the variable pointing to the keyword to
# match. kw should be declared as 'const char *kw' (or the equivalent).
#
# Input is read from stdin and must be in the following format:
# keyword return_value
# Leading/trailing white space is stripped. White space between the
# keyword and return value is also removed.
#
# Output is written to stdout.
#
# Switches:
# -i n specifies the number of spaces per indent level. The
# default is 4.
# -l convert all input keywords to lowercase
# -u convert all input keywords to uppercase
use strict;
use Getopt::Std;
my $prog = $0;
sub usage() {
print STDERR <<'USAGE';
usage: $prog [-i indent_spaces] [-l | -u]
-i specifies the number of spaces per indentation level (default is 4)
-l convert all keywords to lowercase
-u convert all keywords to uppercase
USAGE
exit 1;
}
my $switches = 'i:lu';
my %opts;
if (!getopts($switches, \%opts)) {
usage();
}
my $nspi; # number spaces per indent
if (exists $opts{'i'}) {
usage() if ($opts{'i'} <= 0);
$nspi = $opts{'i'};
}
else {
$nspi = 4;
}
my $lower = 0;
my $upper = 0;
if (exists $opts{'l'}) {
if (exists $opts{'u'}) {
print STDERR "error: -l and -u can not be used together\n";
exit 1;
}
$lower = 1;
}
elsif (exists $opts{'u'}) {
$upper = 1;
}
sub write_switch($\@);
my $indent = 1;
my @keywords;
my $error = 0;
my $line;
while (defined($line = <>)) {
my ($keyword, $rv) = $line =~ /^\s*([^\s]+)\s+([^\s]+)\s*$/;
next if (length($keyword) == 0);
if (length($rv) == 0) {
print STDERR "error: '$keyword' does not have a return value\n";
$error = 1;
}
if ($lower) {
$keyword = lc $keyword;
}
elsif ($upper) {
$keyword = uc $keyword;
}
push @keywords, [ $keyword, $rv, $keyword ];
}
exit 1 if $error;
if (@keywords == 0) {
print STDERR "no keywords\n";
exit 1;
}
@keywords = sort { $a->[0] cmp $b->[0] } @keywords;
write_switch(0, @keywords);
exit 0;
sub write_switch($\@) {
my $sub_switch = $_[0];
my $spaces = ' ' x (($sub_switch + 1) * $nspi);
my @keywords = @{$_[1]};
my @cache = ();
my $prev = '';
if (@keywords == 1) {
if ($keywords[0][0] eq '') {
print "${spaces}if (*kw == '\\0') {\n";
}
else {
print "${spaces}if (strcmp(kw, \"$keywords[0][0]\") == 0) {\n";
}
print "${spaces} return $keywords[0][1]; /* $keywords[0][2] */\n";
print "${spaces}}\n";
print "${spaces}break;\n";
}
else {
print "${spaces}switch (*kw++) {\n";
foreach my $w (@keywords) {
if ($w->[0] eq '') {
print "${spaces}case '\\0':\n";
print "${spaces} return $w->[1]; /* $w->[2] */\n";
next;
}
my $fc = substr($w->[0], 0, 1);
substr($w->[0], 0, 1) = '';
if ($fc ne $prev) {
if ($prev ne '') {
print "${spaces}case '$prev':\n";
write_switch($sub_switch + 1, @cache);
}
@cache = ();
$prev = $fc;
}
push @cache, [ $w->[0], $w->[1], $w->[2] ];
}
if (@cache) {
print "${spaces}case '$prev':\n";
write_switch($sub_switch + 1, @cache);
}
print "${spaces}}\n";
print "${spaces}break;\n" if $sub_switch;
}
return;
}
Return to the
comp.compilers page.
Search the
comp.compilers archives again.