sven b - 4 months ago 7
R Question

# Performance Issue while operating among two columns of a dataframe

Given a dataframe with two columns:

• length (length of elements)

• findLengthOf (This is a string of values) The index of the elements for which the length is needed

So one has to find all the length of all indexes in the second column and put the result in a third column.
Please see above example, where we search for the lenght of 1637 and obtain 1835:

``````> df\$length[1637]
[1] 1835

length findLengthOf
1   6434 1637,386....
2   4272 4322,414....
3   7338 2052,639....
4   4932 190,1567....
5   2397 8963,844....
6   4405 103,4346....

length findLengthOf           result
1   6434 1637,386.... 1835, 2404, 4689
2   4272 4322,414.... 1184, 2721, 7215
3   7338 2052,639.... 5253, 2998, 6153
4   4932 190,1567.... 2931, 6496, 7784
5   2397 8963,844.... 3796, 3488, 6555
6   4405 103,4346.... 1662, 5481, 1244

set.seed(123)
df <- data.frame(length = sample(1e4),
findLengthOf = I(replicate(1e4, paste(sample(1:10000,1),sample(1:10000,1),sample(1:10000,1),sep=","), simplify = FALSE)))

df\$result=lapply(lapply(df\$findLengthOf,strsplit,split=","), function(x){df[x[[1]],"length"]})
``````

Code works, but it takes to long. How can I improve the speed?
Also why does

``````head(lapply(df\$findLengthOf,strsplit,split=","))
``````

always return this weird list of lists with:

``````[[1]]
[[1]][[1]]
[1] "7744" "1346" "4626"
``````

Is there a way to avoid these double brackets?
Any response is greatly appreciated!

Suggestion from David (set fixed=T):

``````> ptm <- proc.time()
> df\$result=lapply(lapply(df\$findLengthOf,strsplit,split=",",fixed=T), function(x){df[x[[1]],"length"]})
> proc.time() - ptm
user  system elapsed
17.220   0.000  17.147
> ptm <- proc.time()
> df\$result=lapply(lapply(df\$findLengthOf,strsplit,split=","), function(x){df[x[[1]],"length"]})
> proc.time() - ptm
user  system elapsed
17.260   0.000  17.142
``````

``````library(data.table)