Arefo Arefo - 9 months ago 127
R Question

R Check a row of strings, if equal, assign equal ID, less time consuming

im fairly new to R and was wondering if anyone here had a better solution to my problem, as mine is too time consuming. I know R is not very "for-loop-friendly" so I am sure there is a better way to solve this.

I have a data frame where x is a text string and y is a numeric id:

x = c("a", "b", "c", "b", "a")
y = c(1,2,3,4,5)
df <- data.frame(x, y)

I want a to find all matches in column x, and assign them the same numeric value as the first in y. I have solved this with the following:


for(i in 1:NROW(df)) {
for(j in i:NROW(df)) {
if(df$x[j] == df$x[i]){
df$y[j] <- df$y[i]
j = j + 1
i = i + 1

Problem is, I have a fairly large dataset which makes this process take a lot of time! Hope anyone here knows a less time consuming alternative!

Answer Source

If your dataset is indeed large, then data.table will probably the fastest solution (see benchmarks here).


df[, y := first(y), by = x]