Arefo Arefo - 29 days ago 8
R Question

R Check a row of strings, if equal, assign equal ID, less time consuming

im fairly new to R and was wondering if anyone here had a better solution to my problem, as mine is too time consuming. I know R is not very "for-loop-friendly" so I am sure there is a better way to solve this.

I have a data frame where x is a text string and y is a numeric id:

x = c("a", "b", "c", "b", "a")
y = c(1,2,3,4,5)
df <- data.frame(x, y)


I want a to find all matches in column x, and assign them the same numeric value as the first in y. I have solved this with the following:

library(foreach)
library(iterators)

for(i in 1:NROW(df)) {
for(j in i:NROW(df)) {
if(df$x[j] == df$x[i]){
df$y[j] <- df$y[i]
}
j = j + 1
}
i = i + 1
}


Problem is, I have a fairly large dataset which makes this process take a lot of time! Hope anyone here knows a less time consuming alternative!

Answer

If your dataset is indeed large, then data.table will probably the fastest solution (see benchmarks here).

library(data.table)
setDT(df)

df[, y := first(y), by = x]