Dance Party Dance Party - 6 months ago 24
Python Question

Extract Number from Varying String

Given this data frame:

import pandas as pd

df = pd.DataFrame({'ID':['a','b','c','d','e','f','g','h','i','j','k'],
'value':['None',np.nan,'6D','7','10D','NONE','x','10D aaa','1 D','10 D aa',7]
})
df


ID value
0 a None
1 b NaN
2 c 6D
3 d 7
4 e 10D
5 f NONE
6 g x
7 h 10D aaa
8 i 1 D
9 j 10 D aa
10 k i7D


I'd like to extract numbers where present, else return 0, for any mess of situations as shown above.

The desired result is:

ID value
0 a 0
1 b 0
2 c 6
3 d 7
4 e 10
5 f 0
6 g 0
7 h 10
8 i 1
9 j 10
10 k 7


Thanks in advance!

Answer

Alternatively, you can apply a function to the dataframe via applymap() following the EAFP principle catching multiple exceptions while extracting the digits:

def get_number(item):
    try:
        return int(re.search(r"\d+", str(item)).group(0))
    except (AttributeError, ValueError, IndexError):
        return 0

print(df.applymap(get_number))

Prints:

    ID  value
0    0      0
1    0      0
2    0      6
3    0      7
4    0     10
5    0      0
6    0      0
7    0     10
8    0      1
9    0     10
10   0      7
Comments