Arnand - 4 months ago 28

R Question

I would like to speed up my solution in R.

I' ve got two Dataframes, let's say:

df_one:

`A | B | C | D | same`

1 | 3 | 2 | 4 | NA

6 | 5 | 1 | 3 | NA

5 | 3 | 7 | 3 | NA

3 | 4 | 8 | 3 | NA

And df_two:

`A | B`

1 | 3

6 | 2

5 | 3

If both the instances in column A and B are the same (or in a sequence of .5), I want a 1, otherwise an 0 in an extra column in df_one (df_one$same).

I did this with the following code:

`df_one$same <- NA`

for (i in 1:nrow(df_one)) {

for (j in 1:nrow(df_two)) {

distance <- seq(df_two[j, 2]-.5, df_two[j, 2]+.5, by = .1)

print(i)

if ((df_one[i, 1] == df_two[j, 1]) & (df_one[i, 2] %in% df_two[i, 2])){

df_one[i, 5] <- 1

break}

else{df_one[i, 5] <- 0}

}

}

Can anyone help me with a faster solution?

Answer

A quicker solution to what I *think* you are asking is to use `left_join`

from `dplyr`

and check explicitly for the matches.

```
left_join(df_one, df_two, by = "A") %>%
mutate(same = B.x == B.y)
```

gives

```
A B.x C D same B.y
1 1 3 2 4 TRUE 3
2 6 5 1 3 FALSE 2
3 5 3 7 3 TRUE 3
4 3 4 8 3 NA NA
```