Johnny Johnny - 4 months ago 21
Python Question

Select rows with certain conditions AND THEN replace those rows with certain values in another column

IF 'state' is in '4. Closed' & 'closeDate' is in '2017/3/27'
THEN change/replace/update 'closeDate' with the date values in 'correctClosedDate' column

enter image description here

df= pd.DataFrame(
{"ID":['A','B','C','D','E'],
"state":['3. Cancelled', '4. Closed', '4. Closed', '3. Cancelled', '4. Closed' ],
"closeDate":['2017/4/12','2017/3/27','2017/4/1','2017/4/29','2017/3/27'],
"correctCloseDate":['', '2017/1/5', '', '', '2017/2/27']
})


Here, I tried locate the rows with these conditions, and I don't know how to replace these rows with values in 'correctCloseDate'.

df.loc[df['state'].isin('4. Closed') & df['closeDate'].isin(['2017-03-27']) == True]


I have an error saying:

TypeError: only list-like objects are allowed to be passed to isin(), you passed a [str]

My desired outcome will look like this.

enter image description here

Any help will be appreciated!

Answer Source

I think you need to_datetime, also was added parameter errors='coerce' for convert not datetimes to NaT (NaN for dates in pandas):

#if necessary convert to datetime
df['closeDate'] = pd.to_datetime(df['closeDate'])
df['correctCloseDate'] = pd.to_datetime(df['correctCloseDate'], errors='coerce')

Then create boolean mask and Series.mask for replace by mask, last remove unnecessary column by drop:

mask = (df['state'] == '4. Closed') & (df['closeDate'] == '2017-03-27')
df['closeDate'] = df['closeDate'].mask(mask, df['correctCloseDate'])
df = df.drop('correctCloseDate', axis=1)
print (df)
  ID  closeDate         state
0  A 2017-04-12  3. Cancelled
1  B 2017-01-05     4. Closed
2  C 2017-04-01     4. Closed
3  D 2017-04-29  3. Cancelled
4  E 2017-02-27     4. Closed

Alternative is use loc for replace:

mask = (df['state'] == '4. Closed') & (df['closeDate'] == '2017-03-27')
df.loc[mask, 'closeDate'] = df['correctCloseDate']
df = df.drop('correctCloseDate', axis=1)
print (df)
  ID  closeDate         state
0  A 2017-04-12  3. Cancelled
1  B 2017-01-05     4. Closed
2  C 2017-04-01     4. Closed
3  D 2017-04-29  3. Cancelled
4  E 2017-02-27     4. Closed

Solution with only strings - but then change 2017-03-27 to 2017/3/27:

mask = (df['state'] == '4. Closed') & (df['closeDate'] == '2017/3/27')
df['closeDate'] = df['closeDate'].mask(mask, df['correctCloseDate'])
df = df.drop('correctCloseDate', axis=1)
print (df)
  ID  closeDate         state
0  A  2017/4/12  3. Cancelled
1  B   2017/1/5     4. Closed
2  C   2017/4/1     4. Closed
3  D  2017/4/29  3. Cancelled
4  E  2017/2/27     4. Closed