Bruce Bruce - 1 month ago 9
Python Question

Python regular expression \W: with vs without parenthesis

Below is a quick demo. Using

\W
for matching non-words and split a given string. Why is there a difference between with and without parenthesis?

>>> s = "abc:def:ghi"
>>> p = "(\W+)"
>>> q = "\W+"
>>> import re
>>> re.split(p, s, flags=re.UNICODE)
['abc', ':', 'def', ':', 'ghi']
>>> re.split(q, s, flags=re.UNICODE)
['abc', 'def', 'ghi']

Answer

From the re module documentation:

If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

For reference, wrapping parts of a regular expression in parentheses creates a capturing group. These are groups of the pattern that can later be referenced as individual entities.