Tsvetan Nikolov Tsvetan Nikolov - 1 year ago 84
R Question

R Loop optimisation/ Loop is way too time consuming

The following loop takes ages. Is there any way to this in a more time-efficient way? The following data.table consists of 27 variables and more than 600k observations.

data <- read.table("file.txt", header = T, sep= "|")
colnames(data)[c(1)] <- c("X")
data <- as.data.table(data)
vector <- vector()
for(i in 2:nrow(data))
if(data[["X"]][i] != data[["X"]][i-1])
n=1; vector[i]=1}
else {
n=n+1; vector[i]=n}}

Basically, I need to index every appearance of a unique entry in X, i.e. the first time it appeared, the second time it appeared, etc and then merge this to the existing data as additional column. However, I got stock at compiling the vector.

Thank you.

Answer Source

First off, use fread:

DT <- fread("file.txt", sep = "|")

Next, use setnames:

setnames(DT, 1, "X")

Finally, use rowid:

DT[ , vector := rowid(X)]    

rowid is available from development version 1.9.7. Install via:

install.packages(pkgs = "data.table",
                 repos = "http://Rdatatable.github.io/data.table",
                 type = "source")

To do this with the current CRAN version (1.9.6), group by rleid instead:

DT[ , vector := 1:.N, by = rleid(X)]