Arefo - 7 months ago 95

R Question

im fairly new to R and was wondering if anyone here had a better solution to my problem, as mine is too time consuming. I know R is not very "for-loop-friendly" so I am sure there is a better way to solve this.

I have a data frame where x is a text string and y is a numeric id:

`x = c("a", "b", "c", "b", "a")`

y = c(1,2,3,4,5)

df <- data.frame(x, y)

I want a to find all matches in column x, and assign them the same numeric value as the first in y. I have solved this with the following:

`library(foreach)`

library(iterators)

for(i in 1:NROW(df)) {

for(j in i:NROW(df)) {

if(df$x[j] == df$x[i]){

df$y[j] <- df$y[i]

}

j = j + 1

}

i = i + 1

}

Problem is, I have a fairly large dataset which makes this process take a lot of time! Hope anyone here knows a less time consuming alternative!

Answer

If your dataset is indeed large, then data.table will probably the fastest solution (see benchmarks here).

```
library(data.table)
setDT(df)
df[, y := first(y), by = x]
```

Source (Stackoverflow)