georussell - 7 months ago 27

Python Question

I'm hoping to delete columns in my arrays that have repeat entries in row 1 as shown below (row 1 has repeats of values 1 & 2.5, so one of each of those values have been been deleted, together with the column each deleted value lies within).

`initial_array =`

row 0 [[ 1, 1, 1, 1, 1, 1, 1, 1,]

row 1 [0.5, 1, 2.5, 4, 2.5, 2, 1, 3.5,]

row 2 [ 1, 1.5, 3, 4.5, 3, 2.5, 1.5, 4,]

row 3 [228, 314, 173, 452, 168, 351, 300, 396]]

final_array =

row 0 [[ 1, 1, 1, 1, 1, 1,]

row 1 [0.5, 1, 2.5, 4, 2, 3.5,]

row 2 [ 1, 1.5, 3, 4.5, 2.5, 4,]

row 3 [228, 314, 173, 452, 351, 396]]

Ways I was thinking of included using some function that checked for repeats, giving a True response for the second (or more) time a value turned up in the dataset, then using that response to delete the row. That or possibly using the return indices function within numpy.unique. I just can't quite find a way through it or find the right function though.

If I could find a way to return an mean value in the row 3 of the retained repeat and the deleted one, that would be even better (see below).

`final_array_averaged =`

row 0 [[ 1, 1, 1, 1, 1, 1,]

row 1 [0.5, 1, 2.5, 4, 2, 3.5,]

row 2 [ 1, 1.5, 3, 4.5, 2.5, 4,]

row 3 [228, 307, 170.5, 452, 351, 396]]

Thanks in advance for any help you can give to a beginner who is stumped!

Answer

You can use the optional arguments that come with `np.unique`

and then use `np.bincount`

to use the last row as weights to get the final averaged output, like so -

```
_,unqID,tag,C = np.unique(arr[1],return_index=1,return_inverse=1,return_counts=1)
out = arr[:,unqID]
out[-1] = np.bincount(tag,arr[3])/C
```

Sample run -

```
In [212]: arr
Out[212]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2.5, 2. , 1. , 3.5],
[ 1. , 1.5, 3. , 4.5, 3. , 2.5, 1.5, 4. ],
[ 228. , 314. , 173. , 452. , 168. , 351. , 300. , 396. ]])
In [213]: out
Out[213]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2. , 2.5, 3.5, 4. ],
[ 1. , 1.5, 2.5, 3. , 4. , 4.5],
[ 228. , 307. , 351. , 170.5, 396. , 452. ]])
```

As can be seen that the output has now an order with the second row being sorted. If you are looking to keep the order as it was originally, use `np.argsort`

of `unqID`

, like so -

```
In [221]: out[:,unqID.argsort()]
Out[221]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2. , 3.5],
[ 1. , 1.5, 3. , 4.5, 2.5, 4. ],
[ 228. , 307. , 170.5, 452. , 351. , 396. ]])
```