leiberl - 1 year ago 87

R Question

So i essentially want to implement the equivalent of R's match() function in Python, using Pandas dataframes - without using a for-loop.

In R match() returns a vector of the positions of (first) matches of its first argument in its second.

Let's say that I have two df A and B, of which both include the column C. Where

`A$C = c('a','b')`

B$C = c('c','c','b','b','c','b','a','a')

In R we would get

`match(A$C,B$C) = c(7,3)`

What is an equivalent method in Python for columns in pandas data frames, that doesn't require looping through the values.

Answer Source

You can use first `drop_duplicates`

and then `boolean indexing`

with `isin`

or `merge`

.

Python counts from `0`

, so for same output add `1`

.

```
A = pd.DataFrame({'c':['a','b']})
B = pd.DataFrame({'c':['c','c','b','b','c','b','a','a']})
B = B.drop_duplicates('c')
print (B)
c
0 c
2 b
6 a
print (B[B.c.isin(A.c)])
c
2 b
6 a
print (B[B.c.isin(A.c)].index)
Int64Index([2, 6], dtype='int64')
```

```
print (pd.merge(B.reset_index(), A))
index c
0 2 b
1 6 a
print (pd.merge(B.reset_index(), A)['index'])
0 2
1 6
Name: index, dtype: int64
```