Andreuccio Andreuccio - 1 month ago 5
Python Question

Assign value to a column based of other columns from the same pandas dataframe

with a dataframe, I have one column, called

TM52_fail


2
1
-
1 & 2
1 & 2 & 3
-
-
3
etc.


and I would like to create an additional column, called
TM52_fail_norm
, whose content depends on the content of the column
TM52_fail
.
My attempt (which includes the conditional filling):

def str_to_number(x):
if x=="1" or x=="2" or x=="3":
return 1
elif x=="1 & 2" or x=="2 & 3" or x=="1 & 3":
return 2
elif x=="1 & 2 & 3":
return 3
else:
return 0

df['TM52_fail_norm'] = ""
df['TM52_fail_norm'].apply(lambda x: str_to_number(x for x in df['TM52_fail']))


returns an empty column (I presume as a result of
df['TM52_fail_norm'] = ""
).

Answer

I think you need cast to string by astype and then apply function str_to_number:

df['new'] = df['TM52_fail_norm'].astype(str).apply(str_to_number)
print (df)
  TM52_fail_norm  new
0              2    1
1              1    1
2              -    0
3          1 & 2    2
4      1 & 2 & 3    3
5              -    0
6              -    0
7              3    1

Another solution with map by dict, last need fillna by 0 and cast to int:

d = {'1':1,'2':1,'3':1,'1 & 2':2, '2 & 3':2, '1 & 3':2,'1 & 2 & 3':3}

df['new'] = df['TM52_fail_norm'].map(d)
df['new'] = df['new'].fillna(0).astype(int)
print (df)
  TM52_fail_norm  new
0              2    1
1              1    1
2              -    0
3          1 & 2    2
4      1 & 2 & 3    3
5              -    0
6              -    0
7              3    1

Timings:

#[800000 rows x 1 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [315]: %timeit (jez1(df))
10 loops, best of 3: 63 ms per loop

In [316]: %timeit (df['TM52_fail_norm'].astype(str).apply(str_to_number))
1 loop, best of 3: 518 ms per loop

#http://stackoverflow.com/a/40176883/2901002
In [345]: %timeit (df.TM52_fail_norm.str.count('\d+'))
1 loop, best of 3: 707 ms per loop


def jez1(df):
    d = {'1':1,'2':1,'3':1,'1 & 2':2, '2 & 3':2, '1 & 3':2,'1 & 2 & 3':3}

    df['new'] = df['TM52_fail_norm'].map(d)
    df['new'] = df['new'].fillna(0).astype(int)
    return (df)

print (jez1(df))