yoghurtlee yoghurtlee - 2 months ago 5
Python Question

Why does re.match/re.search work, but re.findall doesn't work?

I use re.match to find the string like this:

print(re.match('''#include(\s)?".*"''', '''#include "my.h"'''))


then I got the result like this:

<_sre.SRE_Match object; span=(0, 15), match='#include "my.h"'>


and then I replace match function:

print(re.findall('''#include(\s)?".*"''', '''#include "my.h"'''))


the result is:

[' ']


I was confused, why dosen't
re.findall
return the matched string? What's wrong with my regular expression?

Answer

From help(re.findall):

Return a list of all non-overlapping matches in the string.

If one or more capturing groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

Empty matches are included in the result.

Your parenthesized bit, (\s), is a capturing group, so re.findall returns a list of the captures. There’s only one capturing group, so each item in the list is just a string, rather than a tuple.

You can make the group non-capturing using ?:, i.e. (?:\s)?. That isn’t very useful at that point, though, since it’s equivalent to just \s?. For more flexibility – e.g. if you ever need to capture more than one part – re.finditer is probably the best way to go:

for m in re.finditer(r'#include\s*"(.*?)"', '#include "my.h"'):
    print('Included %s using %s' % (m.group(1), m.group(0)))
Comments