seth127 - 2 months ago 9
Python Question

check if pair of values is in pair of columns in pandas

Basically, I have latitude and longitude (on a grid) in two different columns. I am getting fed two-element lists (could be numpy arrays) of a new coordinate set and I want to check if it is a duplicate before I add it.

For example, my data:

``````df = pd.DataFrame([[4,8, 'wolf', 'Predator', 10],
[5,6,'cow', 'Prey', 10],
[8, 2, 'rabbit', 'Prey', 10],
[5, 3, 'rabbit', 'Prey', 10],
[3, 2, 'cow', 'Prey', 10],
[7, 5, 'rabbit', 'Prey', 10]],
columns = ['lat', 'long', 'name', 'kingdom', 'energy'])

newcoords1 = [4,4]
newcoords2 = [7,5]
``````

Is it possible to write one
`if`
statement to tell me whether there is already a row with that latitude and longitude. In pseudo code:

``````if newcoords1 in df['lat', 'long']:
print('yes! ' + str(newcoords1))
``````

(In the example,
`newcoords1`
should be
`false`
and
`newcoords2`
should be
`true`
.

Sidenote:
`(newcoords1[0] in df['lat']) & (newcoords1[1] in df['long'])`
doesn't work because that checks them independently, but I need to know if that combination appears in a single row.

you can do it this way:

``````In [140]: df.query('@newcoords2[0] == lat and @newcoords2[1] == long')
Out[140]:
lat  long    name kingdom  energy
5    7     5  rabbit    Prey      10

In [146]: df.query('@newcoords2[0] == lat and @newcoords2[1] == long').empty
Out[146]: False
``````

the following line will return a number of found rows:

``````In [147]: df.query('@newcoords2[0] == lat and @newcoords2[1] == long').shape[0]
Out[147]: 1
``````

or using NumPy approach:

``````In [103]: df[(df[['lat','long']].values == newcoords2).all(axis=1)]
Out[103]:
lat  long    name kingdom  energy
5    7     5  rabbit    Prey      10
``````

this will show whether at least one row has been found:

``````In [113]: (df[['lat','long']].values == newcoords2).all(axis=1).any()
Out[113]: True

In [114]: (df[['lat','long']].values == newcoords1).all(axis=1).any()
Out[114]: False
``````

Explanation:

``````In [104]: df[['lat','long']].values == newcoords2
Out[104]:
array([[False, False],
[False, False],
[False, False],
[False, False],
[False, False],
[ True,  True]], dtype=bool)

In [105]: (df[['lat','long']].values == newcoords2).all(axis=1)
Out[105]: array([False, False, False, False, False,  True], dtype=bool)
``````