Reed - 5 months ago 50

Python Question

Python newbie here, I have read Filter rows of a numpy array? and the doc but still can't figure out how to code it the python way.

Example array I have: (the real data is 50000 x 10)

`a = numpy.asarray([[2,'a'],[3,'b'],[4,'c'],[5,'d']])`

filter = ['a','c']

I need to find all rows in

`a`

`a[:, 1] in filter`

`[[2,'a'],[4,'c']]`

My current code is this:

`numpy.asarray([x for x in a if x[1] in filter ])`

It works okay but I have read somewhere that it is not efficient. What is the proper numpy method for this?

Thanks for all the correct answers! Unfortunately I can only mark one as accepted answer. I am surprised that

`numpy.in1d`

`numpy filter 2d array`

Answer

You can use a `bool`

index array that you can produce using `np.in1d`

.

You can index a `np.ndarray`

along any `axis`

you want using for example an array of `bool`

s indicating whether an element should be included. Since you want to index along `axis=0`

, meaning you want to choose from the outest index, you need to have 1D `np.array`

whose length is the number of rows. Each of its elements will indicate whether the row should be included.

A fast way to get this is to use `np.in1d`

on the second column of `a`

. You get all elements of that column by `a[:, 1]`

. Now you have a 1D `np.array`

whose elements should be checked against your filter. Thats what `np.in1d`

is for.

So the complete code would look like:

```
import numpy as np
a = np.asarray([[2,'a'],[3,'b'],[4,'c'],[5,'d']])
filter = np.asarray(['a','c'])
a[np.in1d(a[:, 1], filter)]
```

or in a longer form:

```
import numpy as np
a = np.asarray([[2,'a'],[3,'b'],[4,'c'],[5,'d']])
filter = np.asarray(['a','c'])
mask = np.in1d(a[:, 1], filter)
a[mask]
```