EnergyNet - 11 months ago 56

Python Question

I have problem to find the best 'python way' to get back a position(row|column) of value in a pandas DataFrame.

I have a list of numbers...

`list = [1,2,3,4,5,8]`

and a pandas Dataframe.

`df = pd.DataFrame({'A':[1,3,8,8], 'B':[3,3,2,8],'x':[0.4,0.3,0.5,0.8]})`

df

Out[2]:

A B x

0 1 3 0.4

1 3 3 0.3

2 8 2 0.5

3 8 8 0.8

- I will compare the numbers from the list with the numbers in the DataFrame (['A'] and ['B']). In the end I want to know, which number in the list occurs in the DataFrame just once.

I would loop over the DataFrame with every number in the list, but I think this isn't the best python way.

- I need the position of the once occurrences value in the DataFrame in the format (row|column), because if the single Number is in df['B'] then I need additional the value of df['A']. If the single Number is in df['A'] I need additional the value in df['B'],

I have no idea to solve this problem...I would be glad, if you have some keywords for searching so I can fix the problem.

Later I will copy the row, which includes the single numbers and the value behind the numbers in a new DataFrame.

The target is to get the following output...

`dfnew`

SingleNumber AorB x

0 1 3 0.4

1 2 8 0.5

I'm happy for every information to solve this problem. Let me know, if you need additional background information.

PS: I'm a beginner :)

Answer Source

Given your starting point of (note I've renamed `list`

to `data`

otherwise it's shadowing the builtin):

```
data = [1,2,3,4,5,8]
df = pd.DataFrame({'A':[1,3,8,8], 'B':[3,3,2,8],'x':[0.4,0.3,0.5,0.8]})
```

Firstly, flatten out your frame so that you have a single column to work with:

```
flattened = pd.melt(df, value_vars=['A', 'B'])
```

Which gives you:

```
variable value
0 A 1
1 A 3
2 A 8
3 A 8
4 B 3
5 B 3
6 B 2
7 B 8
```

Then filter for the values that are in `data`

(in this case it returns the same dataframe, so I won't copy/paste the same result as above):

```
in_data = flattened[flattened.value.isin(data)]
```

Then drop *all* duplicate values:

```
only_once = in_data.drop_duplicates(subset='value', keep=False)
```

Which gives you:

```
variable value
0 A 1
6 B 2
```

Then you can use the index on that to get back to your original DF:

```
new_df = df.iloc[only_once.index // len(df.columns)]
```

Which gives you:

```
A B x
0 1 3 0.4
2 8 2 0.5
```

Then assign the column...

```
new_df['single_number'] = only_once.value.values
```

And the end result is:

```
A B x single_number
0 1 3 0.4 1
2 8 2 0.5 2
```

This maintains your original index values, if you really want new ones, then look at `.reset_index(drop=True)`

to get 0 and 1 instead.