Azfar Faizan Azfar Faizan - 6 months ago 33
Python Question

Create a new column using specific columns in Pandas using DataFrame.apply

I have a data frame like this

ID 8-Jan 15-Jan 22-Jan 29-Jan 5-Feb 12-Feb LowerBound UpperBound Problem
001 618 720 645 573 503 447 401.329 662.670 False
002 62 80 67 94 81 65 45.710 126.289 False
003 32 10 23 26 26 31 12.314 58.114 True
004 22 13 1 28 19 25 16.438 41.418 True
005 9 7 9 6 8 4 1.182 20.102 False


I want to create a new column which would be a Boolean column such that I want to iterate through all the weeks for each ID and if any value lies outside the upper and lower bound column I set it equal to True else False. The upper and lower bound values in this case are dummy so the data will not return these values. The resulting column should be like the
Problem
column

I know the hard way of doing this which is absolutely inefficient

import pandas as pd

def Problem(df):
r = []
for i in range(len(df)):
res = []
x = [df['Week1'][i], df['Week2'][i], df['Week3'][i], df['Week4'][i], df['Week5'][i]]
for j in range (len(x)):
if (df['LowerBound'][i] <= x[j] <= df['UpperBound'][i]): res.append(True)
else: res.append(False)
if (False in res): r.append(True)
else: r.append(False)
return r

df['Problem'] = Problem(df)


This will work but it is long, hard and inefficient way. I know there is df.apply which can do this for me but I don't understand how to convert my specific function into that. Can someone help ? Thanks

Answer Source

You can do this more succinctly using apply and calling between to test if each row's values are within range, invert the result using ~ and calling any to test if there are any positive values:

In [24]:
df['Problem'] = df.apply(lambda x: ~x.loc['8-Jan':'12-Feb'].between(x['LowerBound'], x['UpperBound']), axis=1).any(axis=1)
df

Out[24]:
   ID  8-Jan  15-Jan  22-Jan  29-Jan  5-Feb  12-Feb  LowerBound  UpperBound  \
0   1    618     720     645     573    503     447     401.329     662.670   
1   2     62      80      67      94     81      65      45.710     126.289   
2   3     32      10      23      26     26      31      12.314      58.114   
3   4     22      13       1      28     19      25      16.438      41.418   
4   5      9       7       9       6      8       4       1.182      20.102   

  Problem  
0    True  
1   False  
2    True  
3    True  
4   False  

We can see the individual steps here:

In [25]:
df.apply(lambda x: x.loc['8-Jan':'12-Feb'].between(x['LowerBound'], x['UpperBound']), axis=1)

Out[25]:
  8-Jan 15-Jan 22-Jan 29-Jan 5-Feb 12-Feb
0  True  False   True   True  True   True
1  True   True   True   True  True   True
2  True  False   True   True  True   True
3  True  False  False   True  True   True
4  True   True   True   True  True   True

invert the mask using ~:

In [26]:
df.apply(lambda x: ~x.loc['8-Jan':'12-Feb'].between(x['LowerBound'], x['UpperBound']), axis=1)

Out[26]:
   8-Jan 15-Jan 22-Jan 29-Jan  5-Feb 12-Feb
0  False   True  False  False  False  False
1  False  False  False  False  False  False
2  False   True  False  False  False  False
3  False   True   True  False  False  False
4  False  False  False  False  False  False

now test if any row values are positive using any:

In [27]:
df.apply(lambda x: ~x.loc['8-Jan':'12-Feb'].between(x['LowerBound'], x['UpperBound']), axis=1).any(axis=1)

Out[27]:
0     True
1    False
2     True
3     True
4    False
dtype: bool