Savage Henry - 2 months ago 12

Python Question

I have a dataframe with the following format:

`id x1 x2 x3`

A 1 0 0

A 0 1 0

A 0 0 0

B 1 0 0

B 0 0 0

B 0 0 1

I would like to "compress" this based on the repeated values in

`id`

`1`

`id`

`id x1 x2 x3`

A 1 1 0

B 1 0 1

My actual data set is millions of rows with potentially thousands of columns. I could do this by grouping by rows and looping over columns, but would like to know if there is a more Pythonic/Pandified/memory efficient way of doing this.

I have looked at answers for questions that are concerned with keeping the max (or some other function) of a column, but that goes back to having to loop over all the columns. (Here is one example.)

Thanks!

Answer

This is the simplest, most efficient way to do this in pandas:

```
In [8]: df.groupby('id').max()
Out[8]:
x1 x2 x3
id
A 1 1 0
B 1 0 1
```