EnergyNet EnergyNet - 1 year ago 90
Python Question

Find position of value, that occurs only once in dataframe

I have problem to find the best 'python way' to get back a position(row|column) of value in a pandas DataFrame.

I have a list of numbers...

list = [1,2,3,4,5,8]

and a pandas Dataframe.

df = pd.DataFrame({'A':[1,3,8,8], 'B':[3,3,2,8],'x':[0.4,0.3,0.5,0.8]})

A B x
0 1 3 0.4
1 3 3 0.3
2 8 2 0.5
3 8 8 0.8

  1. I will compare the numbers from the list with the numbers in the DataFrame (['A'] and ['B']). In the end I want to know, which number in the list occurs in the DataFrame just once.

I would loop over the DataFrame with every number in the list, but I think this isn't the best python way.

  1. I need the position of the once occurrences value in the DataFrame in the format (row|column), because if the single Number is in df['B'] then I need additional the value of df['A']. If the single Number is in df['A'] I need additional the value in df['B'],

I have no idea to solve this problem...I would be glad, if you have some keywords for searching so I can fix the problem.

Later I will copy the row, which includes the single numbers and the value behind the numbers in a new DataFrame.

The target is to get the following output...


SingleNumber AorB x
0 1 3 0.4
1 2 8 0.5

I'm happy for every information to solve this problem. Let me know, if you need additional background information.

PS: I'm a beginner :)

Answer Source

Given your starting point of (note I've renamed list to data otherwise it's shadowing the builtin):

data = [1,2,3,4,5,8]
df = pd.DataFrame({'A':[1,3,8,8], 'B':[3,3,2,8],'x':[0.4,0.3,0.5,0.8]})

Firstly, flatten out your frame so that you have a single column to work with:

flattened = pd.melt(df, value_vars=['A', 'B'])

Which gives you:

  variable  value
0        A      1
1        A      3
2        A      8
3        A      8
4        B      3
5        B      3
6        B      2
7        B      8

Then filter for the values that are in data (in this case it returns the same dataframe, so I won't copy/paste the same result as above):

in_data = flattened[flattened.value.isin(data)]

Then drop all duplicate values:

only_once = in_data.drop_duplicates(subset='value', keep=False)

Which gives you:

  variable  value
0        A      1
6        B      2

Then you can use the index on that to get back to your original DF:

new_df = df.iloc[only_once.index // len(df.columns)]

Which gives you:

   A  B    x
0  1  3  0.4
2  8  2  0.5

Then assign the column...

new_df['single_number'] = only_once.value.values

And the end result is:

   A  B    x  single_number
0  1  3  0.4              1
2  8  2  0.5              2

This maintains your original index values, if you really want new ones, then look at .reset_index(drop=True) to get 0 and 1 instead.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download