Andreas - 4 months ago 48

Python Question

I am working with a large multiIndex DataFrame

`results_matrix`

`Results1`

`Results2`

`Indicator`

Currently, I am accessing each of the elements in a for loop - however, this increases the calculation time of the program quite a lot.

Is there a way to do this more efficiently?

`import pandas as pd`

import numpy as np

selected_results = pd.Series(np.nan)

# Used to iterate through the rows of the DataFrame

i = 0

for items in results_matrix['Indicator']:

if results_matrix.iloc[i]['Indicator'] == 1:

selected_results[i] = results_matrix.iloc[i]['Results1']

else:

selected_results[i] = results_matrix.iloc[i]['Results2']

i += 1

results_matrix['SelectedResults'] = selected_results.values

Answer

I think you need `numpy.where`

:

```
results_matrix['SelectedResults'] = np.where(results_matrix['Indicator'] == 1,
results_matrix['Results1'],
results_matrix['Results2'])
```

Sample:

```
results_matrix = pd.DataFrame({'Indicator':[1,2,3],
'Results1':[4,5,6],
'Results2':[7,8,9]})
print (results_matrix)
Indicator Results1 Results2
0 1 4 7
1 2 5 8
2 3 6 9
results_matrix['SelectedResults'] = np.where(results_matrix['Indicator'] == 1,
results_matrix['Results1'],
results_matrix['Results2'])
print (results_matrix)
Indicator Results1 Results2 SelectedResults
0 1 4 7 4
1 2 5 8 8
2 3 6 9 9
```