Antonio L&#243;pez Ruiz - 1 year ago 99
Python Question

# Selecting highest rows on matrix pandas python.

I have the following data:

https://github.com/antonio1695/Python/blob/master/nearBPO/facturasb.csv

It is a matrix like the following example:

``````UUID  A   B   C   D   E   F   G   H   I
1.1   0   1   0   0   0   1   0   0   0
1.2   1   1   0   0   0   0   0   0   0
1.3   0   0   0   0   1   0   0   0   0
1.4   0   0   0   1   0   1   1   1   1
1.5   0   1   0   0   0   0   1   0   0
1.6   0   0   1   0   0   0   1   0   0
1.7   0   1   0   0   0   0   0   1   0
1.8   0   0   1   0   0   0   1   0   0
1.9   0   1   0   0   0   0   1   0   1
``````

I would like to make a new matrix with only the 50 highest columns (3 in the example) and it's respective UUID. With the highest columns i mean those columns that have more 1's in the matrix.

If i'm not clear enough, don't hesitate asking. Thank you.

IIUC

``````df[df.sum().nlargest(3).index]
``````

To exclude rows with all zeros among the n largest

``````n = df.sum().nlargest(3).index
df1 = df.loc[:, n]
df1[df1.eq(1).any(1)]
``````

### Setup

``````from StringIO import StringIO
import pandas as pd

text = """UUID  A   B   C   D   E   F   G   H   I
1.1   0   1   0   0   0   1   0   0   0
1.2   1   1   0   0   0   0   0   0   0
1.3   0   0   0   0   1   0   0   0   0
1.4   0   0   0   1   0   1   1   1   1
1.5   0   1   0   0   0   0   1   0   0
1.6   0   0   1   0   0   0   1   0   0
1.7   0   1   0   0   0   0   0   1   0
1.8   0   0   1   0   0   0   1   0   0
1.9   0   1   0   0   0   0   1   0   1"""

``````

### Bonus solution with numpy

Assuming same setup (this is probably quicker)

``````n = df.values.sum(0).argsort()[-3:][::-1]
m = (a[:, n] == 1).any(1)

df.iloc[m, n]
``````

Notice the columns are not the same as my other solution. That is because the multiple columns summed to the same value.

### Timing

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download