Jeff Jeff - 3 months ago 41
Python Question

Python pandas extract, how to extract remaining part of string

I have looked for hours and this should be simple. I am trying to extract all the letters from a string with a mixture or digits and letters. Here is an example:

df = pd.Series(['ENGLANDSR11SW'])
df = df.to_frame('column')
df['ValueAfterExtract'] = df['column'].str.extract("(?P<letter>[a-zA-Z]+)")
print(df)


From the string value
ENGLANDSR11SW
in the dataframe, the result is
ENGLANDSR
but i want to bring even the last letters of the string which is the
SW
which should result in
ENGLANDSRSW
, meaning only the digits
11
would be removed.

How can i do this?

Answer

Replace all digits (\d) with empty strings:

In [6]: df['column'].str.replace(r'\d', '')
Out[10]: 
0    ENGLANDSRSW
Name: column, dtype: object

Or, to remove everything which is not in [a-zA-Z] use the regexp [^a-zA-Z]. This would remove, for instance, whitespace and punctuation marks as well as digits:

In [20]: df['column'].str.replace(r'[^a-zA-Z]', '')
Out[20]: 
0    ENGLANDSRSW
Name: column, dtype: object
Comments