DBB - 1 year ago 82

Python Question

So I want to take the average of all values in column b when column a is a particular and plot it using Matplotlib.

So in the table above I want to average out the values in B and E for every same value in A and hence create a new element where

A =57 B= Avg of all values of b where A= 57 E= Avg of all values of e where A =57 and so on

And then finally plot the new element

I tried to implement it by taking the values into another Identity matrix but that does not work.

`for x in list_of_entries:`

Final['A'] = x;

Final['C'] = 0;

Final['D'] = 1;

I = np.logical_and((1), (data_temp['A'].astype(int) == x))

Final['B'] = np.average(data_temp[I]['B']);

Final['E'] = np.average(data_temp[I]['E']);

np.empty(I);

Answer Source

With NumPy only, you could use `np.unique(..., return_indx=True)`

to find the indices which demarcate the chunks with constant `A`

value:

```
data_temp.sort(order=['A'])
uniqs, idx = np.unique(data_temp['A'], return_index=True)
idx = np.r_[idx, len(data_temp)]
# >>> idx
# array([ 0, 10, 20, 33, 42, 50, 58, 71, 79, 90, 100])
```

Then you can access the chunks of `data_temp`

with constant `A`

value using

```
data_temp[idx[i], idx[i+1]]
```

for each `i = 0,..., len(idx)-1`

.

This is quicker than using

```
for val in uniqs:
mask = data_temp['A'] == val
chunk = data_temp.loc[mask]
```

because accessing basic slices is much faster than advanced indexing with boolean selection masks.

```
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(2016)
data_temp = np.random.randint(10, size=(6*100)).view(
[(col, '<i8')for col in list('ABCDEF')])
data_temp.sort(order=['A'])
uniqs, idx = np.unique(data_temp['A'], return_index=True)
idx = np.r_[idx, len(data_temp)]
result = []
for i in range(len(idx)-1):
val = uniqs[i]
start, end = idx[i], idx[i+1]
# Uncomment to see the chunks of `data_temp` with constant A value
# print(data_temp[start:end])
mean = {col:data_temp[col][start:end].mean() for col in ['B', 'E']}
result.append([val, mean['B'], 0, 1, mean['E']])
result = np.array(result)
print(result)
fig, ax = plt.subplots()
ax.plot(result[:, 0], result[:, 1])
ax.plot(result[:, 0], result[:, 4])
plt.show()
```

If you have Pandas, the whole calculation becomes incredibly simple:

```
import pandas as pd
import matplotlib.pyplot as plt
data_temp = pd.read_csv(dir_readfile, delimiter='\t', skiprows=1, names=names,
usecols=list(range(6)))
fig, ax = plt.subplots()
result = data_temp.groupby('A').agg({'B':'mean', 'E':'mean'})
result.plot()
plt.show()
```