wwl wwl - 2 months ago 11
Python Question

pandas assign value based on mean

Let's say I have a dataframe column. I want to create a new column where the value for a given observation is 1 if the corresponding value in the old column is above average. But the value should be 0 if the value in the other column is average or below.

What's the fastest way of doing this?

Answer

Say you have the following DataFrame:

df = pd.DataFrame({'A': [1, 4, 6, 2, 8, 3, 7, 1, 5]})
df['A'].mean()
Out: 4.111111111111111

Comparison against the mean will get you a boolean vector. You can cast that to integer:

df['B'] = (df['A'] > df['A'].mean()).astype(int)

or use np.where:

df['B'] = np.where(df['A'] > df['A'].mean(), 1, 0)

df
Out: 
   A  B
0  1  0
1  4  0
2  6  1
3  2  0
4  8  1
5  3  0
6  7  1
7  1  0
8  5  1