McLeodx McLeodx - 5 months ago 18
Python Question

When would the use of re.search over re.findall in Python Regex make sense?

I understand the technical difference between using

re.search
and
re.findall
in Python, but would someone with more experience explain situations in which you might use
re.search
over just using
re.findall
for regex parsing?

Answer

From documentation

re.search(pattern, string, flags=0) :- Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

i) If you just want to find whether there exists a pattern in string, you can use re.search e.g.

a+ in string abcdaa

will tell whether there is one or more than one a present in the string abcaa. If the match is found, it will return a MatchObject that string is found otherwise None. It won't check for any further occurrences of the pattern. So, if you use re.search('a+', 'abcdaa').group(0) you will only get a for string abcdaa

On the other hand, re.findall will return all matches that are found in a string, like [a, aa] for the string abcdaa. So, we can say that re.findall is python way of using g flag which finds all matches.

ii) One may argue that why not use re.findall to find all the matches and if the list is non-empty, then we can say that pattern exists.

In that case, re.findall will be (much) slower than re.search.

Comparison (Processor - Intel® Core™ i5-5200U CPU @ 2.20GHz × 4, Memory - 7.7 GiB)

On a string of size 10000000, using the following code

import re
import time

st = "".join(str(n) for n in range(10000000))

start_time = time.time()
re.search(r"1+", st)
first_time = time.time()
print("Time taken by re.search = ", first_time - start_time, "seconds")

re.findall(r"1+", st)
second_time = time.time()
print("Time taken by re.search = ", second_time - first_time, "seconds")

Output was

Time taken by re.search =  0.00011801719665527344 seconds
Time taken by re.search =  1.7739462852478027 seconds

So, if we just want to know whether there is a pattern that exists in a string, its favorable to use re.search.

Comments