Adam - 9 months ago 70

Python Question

I have a data frame with two columns,

`A`

`B`

`A`

`B`

`(0,50)`

`(50,0)`

`import pandas as pd`

# Initial data frame.

data = pd.DataFrame({'A': [0, 10, 11, 21, 22, 35, 5, 50],

'B': [50, 22, 35, 5, 10, 11, 21, 0]})

data

A B

0 0 50

1 10 22

2 11 35

3 21 5

4 22 10

5 35 11

6 5 21

7 50 0

# Desired output with "duplicates" removed.

data2 = pd.DataFrame({'A': [0, 5, 10, 11],

'B': [50, 21, 22, 35]})

data2

A B

0 0 50

1 5 21

2 10 22

3 11 35

Ideally, the output would be sorted by values of column

`A`

Answer Source

You can sort each row of the data frame before dropping the duplicates:

```
data.apply(lambda r: sorted(r), axis = 1).drop_duplicates()
# A B
#0 0 50
#1 10 22
#2 11 35
#3 5 21
```

If you prefer the result to be sorted by column `A`

:

```
data.apply(lambda r: sorted(r), axis = 1).drop_duplicates().sort_values('A')
# A B
#0 0 50
#3 5 21
#1 10 22
#2 11 35
```