Savage Henry - 1 year ago 85

Python Question

I have a dataframe with the following format:

`id x1 x2 x3`

A 1 0 0

A 0 1 0

A 0 0 0

B 1 0 0

B 0 0 0

B 0 0 1

I would like to "compress" this based on the repeated values in

`id`

`1`

`id`

`id x1 x2 x3`

A 1 1 0

B 1 0 1

My actual data set is millions of rows with potentially thousands of columns. I could do this by grouping by rows and looping over columns, but would like to know if there is a more Pythonic/Pandified/memory efficient way of doing this.

I have looked at answers for questions that are concerned with keeping the max (or some other function) of a column, but that goes back to having to loop over all the columns. (Here is one example.)

Thanks!

Answer Source

This is the simplest, most efficient way to do this in pandas:

```
In [8]: df.groupby('id').max()
Out[8]:
x1 x2 x3
id
A 1 1 0
B 1 0 1
```