DGraham - 1 year ago 366
Python Question

# multiple conditions for np.where python pandas

I have the following dataframe:

``````region  pop_1   pop_1_source    pop_2   pop_2_source    pop_3   pop_3_source
a   99  x1  84  x2  61  x3
b   64  x1  65  x2  16  x3
c   92  x1  26  x2  6   x3
d   82  x1  60  x2  38  x3
e   45  x1  77  x2  42  x3
``````

I can calculate the highest value found in each region through:

``````df['upper_limit'] = df[['pop_1','pop_2','pop_3']].max(axis=1)
``````

If I only compare two populations I can then calculate the source of the highest population i.e:

``````df['upper_limit_source'] = np.where(df.upper_limit == df['upper_limit'],df.pop_1,df.pop_2)
``````

However if I try expand this to search into all three columns, it fails to work.
I have searched for a solution, but cannot make anything work with np.where(np.logical_or or similar.

Am I missing something obvious?

I found your question a bit confusing (among other things, `df.upper_limit == df['upper_limit']` is always true, and your "source" columns are all filled with `x1` (except for one `1x` which looks like a typo)).

However, it seems like you'd like to find which of the three columns was responsible for the maximum, then calculate a value based on this. So, to calculate the column responsible, you could use `np.argmax`:

``````import numpy as np

idmax = np.argmax(df[['pop_1','pop_2','pop_3']].as_matrix(), axis=1)
``````

This will give you, for each row, 0, 1, or 2, depending on which of the three columns was responsible for the maximum.

Now if, for example, you'd like to choose `pop_1_source`, `pop_2_source`, or `pop_3_source`, according to the index, you could use `np.choose`:

``````np.choose(idmax, df[[`pop_1_source', 'pop_2_source', pop_3_source']].as_matrix().T)
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download