lidl - 1 year ago 83
Python Question

# re.search becomes unresponsive

When i run this code it doesn't print neither

`'checked'`
nor
`'not matching'`
. It stops responding completely.

``````url='http://hoswifi.bblink.cn/v3/2-fd1cc0657845832e5e1248e6539a50fa/topic/55-13950.html?from=home'

m=re.search(r'/\d-(B|(\w+){10,64})/index.html',url)
if m:
print('checked')
else:
print('not matching')
``````

Suppose we have the following script:

``````s = '1234567890'
m = re.search(r'(\w+)*z', s)
``````

Our string contains 10 digits, and does not contain `'z'`. This is intentional so that it forces `re.search` to check all possible combinations, otherwise it will stop on first match.

I can't calculate the number of possible combinations, since math involved is rather tricky, but here is a small demonstration on what happens when `s` gets more digits:

Time goes from roughly 1μs for a single digit `s` to 100 seconds for a 30 digit `s`, that is, 108 more time.

My guess is that something similar happens when you use `(\w+){10,64}`. Instead you should use `\w{10,64}`.

Code used for the demo:

``````import timeit
import matplotlib.pyplot as plt

setup = """
import re
"""
_base_stmt = "m = re.search(r'(\w+)*z','{}')"

# (searched string becomes '1', '11', '111'...)
statements = {}
for i in range(1, 18):
statements.update({i: _base_stmt.format('1'*i)})

# Creates x, y values
x = []
y = []
for i in sorted(statements):
x.append(i)
y.append(timeit.timeit(statements[i], setup, number=1))

# Plot
plt.plot(x, y)
plt.xlabel('string length')
plt.ylabel('time(sec)')
plt.show()
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download