Andreas Andreas - 1 month ago 18
Python Question

Pandas - efficient element comparisons

I am working with a large multiIndex DataFrame

results_matrix
with several columns. I need to access each element of the columns
Results1
and
Results2
and select one of those elements based on an indicator variable that is stored in the column
Indicator
of the DataFrame.

Currently, I am accessing each of the elements in a for loop - however, this increases the calculation time of the program quite a lot.
Is there a way to do this more efficiently?

import pandas as pd
import numpy as np

selected_results = pd.Series(np.nan)
# Used to iterate through the rows of the DataFrame
i = 0
for items in results_matrix['Indicator']:
if results_matrix.iloc[i]['Indicator'] == 1:
selected_results[i] = results_matrix.iloc[i]['Results1']
else:
selected_results[i] = results_matrix.iloc[i]['Results2']
i += 1

results_matrix['SelectedResults'] = selected_results.values

Answer

I think you need numpy.where:

results_matrix['SelectedResults'] = np.where(results_matrix['Indicator'] == 1, 
                                             results_matrix['Results1'], 
                                             results_matrix['Results2'])

Sample:

results_matrix = pd.DataFrame({'Indicator':[1,2,3],
                               'Results1':[4,5,6],
                               'Results2':[7,8,9]})

print (results_matrix)
   Indicator  Results1  Results2
0          1         4         7
1          2         5         8
2          3         6         9

results_matrix['SelectedResults'] = np.where(results_matrix['Indicator'] == 1, 
                                             results_matrix['Results1'], 
                                             results_matrix['Results2'])

print (results_matrix)                                             
   Indicator  Results1  Results2  SelectedResults
0          1         4         7                4
1          2         5         8                8
2          3         6         9                9