McLeodx McLeodx - 11 months ago 45
Python Question

When would the use of over re.findall in Python Regex make sense?

I understand the technical difference between using
in Python, but would someone with more experience explain situations in which you might use
over just using
for regex parsing?


From documentation, string, flags=0) :- Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

i) If you just want to find whether there exists a pattern in string, you can use e.g.

a+ in string abcdaa

will tell whether there is one or more than one a present in the string abcaa. If the match is found, it will return a MatchObject that string is found otherwise None. It won't check for any further occurrences of the pattern. So, if you use'a+', 'abcdaa').group(0) you will only get a for string abcdaa

On the other hand, re.findall will return all matches that are found in a string, like [a, aa] for the string abcdaa. So, we can say that re.findall is python way of using g flag which finds all matches.

ii) One may argue that why not use re.findall to find all the matches and if the list is non-empty, then we can say that pattern exists.

In that case, re.findall will be (much) slower than

Comparison (Processor - Intel® Core™ i5-5200U CPU @ 2.20GHz × 4, Memory - 7.7 GiB)

On a string of size 10000000, using the following code

import re
import time

st = "".join(str(n) for n in range(10000000))

start_time = time.time()"1+", st)
first_time = time.time()
print("Time taken by = ", first_time - start_time, "seconds")

re.findall(r"1+", st)
second_time = time.time()
print("Time taken by = ", second_time - first_time, "seconds")

Output was

Time taken by =  0.00011801719665527344 seconds
Time taken by =  1.7739462852478027 seconds

So, if we just want to know whether there is a pattern that exists in a string, its favorable to use