Related articles |
---|
Binary Regular Expression Matching (bregexp) cthulhu@diku.dk (1995-03-20) |
Re: Binary Regular Expression Matching (bregexp) henry@zoo.toronto.edu (1995-03-22) |
Newsgroups: | comp.compilers |
From: | cthulhu@diku.dk (Stefan Krabbe) |
Summary: | Where can I find a binary pattern matching program |
Keywords: | DFA, question, lex |
Organization: | Department of Computer Science, U of Copenhagen |
Date: | Mon, 20 Mar 1995 05:58:32 GMT |
Question: Do you know if there exists a regexp-function that works
on binary files, preferably written in C, that I can
grab? I'm looking for something almost like the library
function regexp(3) from 4.3 BSD, or the regcomp(3C)
from HPUX.
I'm looking for these features in the regexp() function:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
1-It must be able to do a binary search! A binary regexp-string could be
"\377\373." (note: I speak C here. \377 is octal and equal to 255)
meaning: match byte == 255, followed by byte == 251, followed by any byte.
It would be nice if it could match the '\0' byte too, like:
"asd\000asd".
2-I'd like to be able to specify my own whitespace.
That way I can make it a normal text-regexp if I set whitespace to newline.
It should also be possible to set whitespace to NOWHITESPACE, ie
the regexp-function would have to look through an entire binary
file, if that was what I wanted.
Usually the strings/files that must be searched for a match,
will be about 3-300 bytes long, but it would be nice if I could specify
a maximum match-length, especially when whitespace (record sepparators) can
be turned off.
3-I'd like it to be possible to include subexpressions in a regexp.
Let's say that a subexpression must be enclosed between the character
pairs \( and \), like in sed.
Example:
regexp =
"[mM]y name is \([A-Za-z]*\) and I need a bregexp."
4-If I get a match, I'd like to know the offsets to the start of the
match and the end of the match. I'd also like offsets to
subexpression-matches (see bellow).
Example:
If the text (in this case it's not binary) we try to match is:
"Hello there, my name is Stefan and I need a bregexp."
and we use the above regexp, then I'd like the first offsets to be
"Hello there, my name is Stefan and I need a bregexp."
^ ^
start end
the second offsets to be
"Hello there, my name is Stefan and I am 26 years old."
^ ^
start end
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Well, that's it. I hope you can tell me where it is. Someone
must have made it allready...no?
Best Regards
Stefan - cthulhu@diku.dk
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.