Binary Regular Expression Matching (bregexp) (Stefan Krabbe)
Mon, 20 Mar 1995 05:58:32 GMT

          From comp.compilers

Related articles
Binary Regular Expression Matching (bregexp) (1995-03-20)
Re: Binary Regular Expression Matching (bregexp) (1995-03-22)
| List of all articles for this month |

Newsgroups: comp.compilers
From: (Stefan Krabbe)
Summary: Where can I find a binary pattern matching program
Keywords: DFA, question, lex
Organization: Department of Computer Science, U of Copenhagen
Date: Mon, 20 Mar 1995 05:58:32 GMT

Question: Do you know if there exists a regexp-function that works
                    on binary files, preferably written in C, that I can
                    grab? I'm looking for something almost like the library
                    function regexp(3) from 4.3 BSD, or the regcomp(3C)
                    from HPUX.

I'm looking for these features in the regexp() function:

1-It must be able to do a binary search! A binary regexp-string could be

              "\377\373." (note: I speak C here. \377 is octal and equal to 255)

    meaning: match byte == 255, followed by byte == 251, followed by any byte.
    It would be nice if it could match the '\0' byte too, like:


2-I'd like to be able to specify my own whitespace.
    That way I can make it a normal text-regexp if I set whitespace to newline.
    It should also be possible to set whitespace to NOWHITESPACE, ie
    the regexp-function would have to look through an entire binary
    file, if that was what I wanted.
    Usually the strings/files that must be searched for a match,
    will be about 3-300 bytes long, but it would be nice if I could specify
    a maximum match-length, especially when whitespace (record sepparators) can
    be turned off.

3-I'd like it to be possible to include subexpressions in a regexp.
    Let's say that a subexpression must be enclosed between the character
    pairs \( and \), like in sed.

              regexp =
              "[mM]y name is \([A-Za-z]*\) and I need a bregexp."

4-If I get a match, I'd like to know the offsets to the start of the
    match and the end of the match. I'd also like offsets to
    subexpression-matches (see bellow).

    If the text (in this case it's not binary) we try to match is:

              "Hello there, my name is Stefan and I need a bregexp."

    and we use the above regexp, then I'd like the first offsets to be

              "Hello there, my name is Stefan and I need a bregexp."
                                          ^ ^
                                          start end

    the second offsets to be

              "Hello there, my name is Stefan and I am 26 years old."
                                                                ^ ^
                                                                start end

Well, that's it. I hope you can tell me where it is. Someone
must have made it

Best Regards
    Stefan -

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.