Pansy Pansy - 4 months ago 9
Python Question

Conditionally Select and Set Column Values

I have two dataframes. I need to copy the values of df2.faults column to df1.faults column based on the values of unit and date.

The two dataframes have different lengths. df1 has possible duplicates of (unit,date) contrary to df2.
An example that mimics my dataset:

df1 = pd.DataFrame({'unit': ['x']*5+['y']*6 + ['z']*5,
'date': ['2016-06-14', '2016-06-14', '2016-06-15', '2016-06-16', '2016-06-16',
'2016-06-14', '2016-06-14', '2016-06-15', '2016-06-15', '2016-06-16', '2016-06-16',
'2016-06-15', '2016-06-16', '2016-06-16', '2016-06-17', '2016-06-17'],
'faults': None})
df1.date = pd.to_datetime(df1.date)
print(df1)
date faults unit
0 2016-06-14 None x
1 2016-06-14 None x
2 2016-06-15 None x
3 2016-06-16 None x
4 2016-06-16 None x
5 2016-06-14 None y
6 2016-06-14 None y
7 2016-06-15 None y
8 2016-06-15 None y
9 2016-06-16 None y
10 2016-06-16 None y
11 2016-06-15 None z
12 2016-06-16 None z
13 2016-06-16 None z
14 2016-06-17 None z
15 2016-06-17 None z

df2 = pd.DataFrame({'unit': ['x']*3+['y']*3 + ['z']*3,
'date': ['2016-06-14', '2016-06-15', '2016-06-16',
'2016-06-14', '2016-06-15', '2016-06-16',
'2016-06-15', '2016-06-16', '2016-06-17'],
'faults': [76, 12, 30, 45, 23, 25, 10, 26, 43]})
df2.date = pd.to_datetime(df2.date)
print(df2)
date faults unit
0 2016-06-14 76 x
1 2016-06-15 12 x
2 2016-06-16 30 x
3 2016-06-14 45 y
4 2016-06-15 23 y
5 2016-06-16 25 y
6 2016-06-15 10 z
7 2016-06-16 26 z
8 2016-06-17 43 z


The required output using nested loops:

for u in pd.unique(df2.unit):
for d in pd.unique(df2[df2.unit == u].date):
df1.ix[(df1.unit == u)&(df1.date == d) ,'faults'] = int(df2[(df2.unit == u)&(df2.date == d)]['faults'])
print(df1)
date faults unit
0 2016-06-14 76 x
1 2016-06-14 76 x
2 2016-06-15 12 x
3 2016-06-16 30 x
4 2016-06-16 30 x
5 2016-06-14 45 y
6 2016-06-14 45 y
7 2016-06-15 23 y
8 2016-06-15 23 y
9 2016-06-16 25 y
10 2016-06-16 25 y
11 2016-06-15 10 z
12 2016-06-16 26 z
13 2016-06-16 26 z
14 2016-06-17 43 z
15 2016-06-17 43 z


I can't think of an efficient approach! List comprehension, conditional indexing, ...? Am I missing something?

Thanks!

Update



One-loop solution is

for index, row in df2.iterrows():
df1.ix[(df1.unit == row['unit'])&(df1.date == row['date']) ,'faults'] = row['faults']


Any more efficient solution? My dataset is relatively large that I want to avoid loops at all.

Answer

Simple, use a left merge :

df1 = pd.merge(df1,df2,how='left',on=['date','unit'])
df1 =  
         date faults_x unit  faults_y
0  2016-06-14     None    x        76
1  2016-06-14     None    x        76
2  2016-06-15     None    x        12
3  2016-06-16     None    x        30
4  2016-06-16     None    x        30
5  2016-06-14     None    y        45
6  2016-06-14     None    y        45
7  2016-06-15     None    y        23
8  2016-06-15     None    y        23
9  2016-06-16     None    y        25
10 2016-06-16     None    y        25
11 2016-06-15     None    z        10
12 2016-06-16     None    z        26
13 2016-06-16     None    z        26
14 2016-06-17     None    z        43
15 2016-06-17     None    z        43

# Some Bookkeeping
df1 = df1.drop('faults_x',1)
df1.rename(columns={'faults_y':'faults'})

# Final Output
df1 = 
         date unit  faults
0  2016-06-14    x      76
1  2016-06-14    x      76
2  2016-06-15    x      12
3  2016-06-16    x      30
4  2016-06-16    x      30
5  2016-06-14    y      45
6  2016-06-14    y      45
7  2016-06-15    y      23
8  2016-06-15    y      23
9  2016-06-16    y      25
10 2016-06-16    y      25
11 2016-06-15    z      10
12 2016-06-16    z      26
13 2016-06-16    z      26
14 2016-06-17    z      43
15 2016-06-17    z      43

Remember your joins and you will be fine!! :)

In case you want to do it in one go then:

df1 = pd.merge(df1.drop('faults',1),df2,how='left',on=['date','unit'])