Jeff Tilton Jeff Tilton - 16 days ago 9
Python Question

pandas apply function over columns of dataframe with if/truth statements

I have three columns in a pandas dataframe.

df = pd.DataFrame({'month':[1,2,3,4,5,6,7,8,9,10,11,12],
'day':[10,23,16,30,1,23,3,28,29,1,4,2],
'year':[1990,1990,1990,1990,1990,1990,1990,1990,1990,1990,1990,1990]})


I want to return the water year in another column which is defined as year +1 for any day after September 28. For example September 29 1990 is water year 1991.

I have a function that works on its own

def wy(month, day, year):
if month >8 and day > 28:
year = year + 1
else:
if month>9:
year = year + 1
else:
year = year
return year
wy(9,30,2000)


But

df['wy'] = wy(df['month'],df['day'],df['year'])


returns the error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().


When I tried to apply it to the dataframe. I have looked at the docs and other questions and have tried multiple other functions such as

def wy(month, day, year):
if any(month >8 & day > 28):
year = year + 1
else:
if any(month>9):
year = year + 1
else:
year = year
return year


def wy(month, day, year):
if any((month >8 & day > 28)|month>9):
year = year + 1
return year


But I still am returning errors. I have also tried a lambda expression (not very pretty)

df['wy'] = (lambda x,y,z:z+1 if (x >8 & y > 28)|y>9 else z, df['month'],df['day'],df['year'])


Thanks for any help

Answer

You can use a tuple of (month, day) and compare to that applied across the column axis, eg:

df = pd.DataFrame({'month':[1,2,3,4,5,6,7,8,9,10,11,12],
                   'day':[10,23,16,30,1,23,3,28,29,1,4,2],
                    'year':[1990,1990,1990,1990,1990,1990,1990,1990,1990,1990,1990,1990]})

Using:

df.apply(lambda L: L.year + ((L.month, L.day) >= (9, 28)), axis=1)

Gives you:

0     1990
1     1990
2     1990
3     1990
4     1990
5     1990
6     1990
7     1990
8     1991
9     1991
10    1991
11    1991
dtype: int64

This works because tuples are compared element by element and month/day are natural ordering. Any true condition evaluates to what's effectively 1 and a false condition to 0 - and we add that to the year to move it into the next if required.