xpt xpt - 4 months ago 37
Python Question

pandas.DataFrame.replace with wildcards

Does the

regex replace support wildcards and "capture groups"?

E.g., to replace
([A-Z])(\w+)
with
\2\1
?

What kind of regular expression is supported? Does Perl's regex supported? E.g., OK to replace
([A-Z])(\w+)
with
\l\1\2
(
\l
: Change the next character to lowercase.)

UPDATE:

As Steve has pointed out, according to the Python documentation, it should work, but the following is not giving me what I expected:

df = pd.DataFrame({'A': np.random.choice(['foo', 'bar'], 100),
'B': np.random.choice(['one', 'two', 'three'], 100),
'C': np.random.choice(['I1', 'I2', 'I3', 'I4'], 100),
'D': np.random.randint(-10,11,100),
'E': np.random.randn(100)})
df.replace("f(.)(.)","b\1\2", regex=True,inplace=True)


What's wrong?

Thx

Answer

According to the pandas documentation:

Regex substitution is performed under the hood with re.sub. The rules for substitution for re.sub are the same.

So, yes, any substitutions which can be performed with Python's re.sub (such as \1) can also be performed with pandas.DataFrame.replace. See the Python documentation for more information.

Comments