evoliptic evoliptic - 2 months ago 6
Python Question

re and binary file, python strange behavior

i have a strange behavior of re.search on a binary file. Here is my python screenshot :

enter image description here

As you can see, i have two problems :


  • re can't find a pattern which is indeed in the file

  • include "\x5b" in an expression results in an error



Any idea?

Answer

\x5b is the ASCII [ character, the left square bracket. That's a regex meta character forming the start of a [...] character class specification and needs to be escaped if you want to match a literal [ character:

>>> import re
>>> re.search('[', '')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/re.py", line 146, in search
    return _compile(pattern, flags).search(string)
  File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: unexpected end of regular expression
>>> re.search('\[', '')

The same applies to \x41, that's the ^ character, which in a regex context matches the start of the string only, not the literal character ^. Since you tried to match data before the ^ point the regex can't match anything, simply because that makes the anchor invalid.

If you are only searching for literal text matches, don't use a regex. You could just use str.find() or str.index() to get the index of matched text.

If you are using this in a larger expression and generate the expression from data, then use re.escape() to ensure all metacharacters are properly escaped first.