Nizag Nizag - 3 years ago 71
Python Question

Checking which rows contain a value efficiently

I am trying to write a function that checks for the presence of a value in a row across columns. I have a script that does this by iterating through columns, but I am worried that this will be inefficient when used on large datasets.

Here is my current code:

import pandas as pd

a = [1, 2, 3, 4]
b = [2, 3, 3, 2]
c = [5, 6, 1, 3]
d = [1, 0, 0, 99]

df = pd.DataFrame({'a': a,
'b': b,
'c': c,
'd': d})

cols = ['a', 'b', 'c', 'd']
df['e'] = 0
for col in cols:
df['e'] = df['e'] + df[col] == 1
print(df)


result:

a b c d e
0 1 2 5 1 True
1 2 3 6 0 False
2 3 3 1 0 True
3 4 2 3 99 False


As you can see, column e keeps record of whether the value "1" exists in that row. I was wondering if there was a better/more efficient way of achieving these results.

Answer Source

You can check if values in the data frame is one and see if any is true in a row (with axis=1):

df['e'] = df.eq(1).any(1)
df
#   a   b   c   d   e
#0  1   2   5   1   True
#1  2   3   6   0   False
#2  3   3   1   0   True
#3  4   2   3   99  False
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download