Alexander - 17 days ago 4x
R Question

# Things are slower with "dplyr" is there a faster way?

I m just trying to calculate the relative angle between with my x,y,z data frame to the reference vector. So far, I use

`dplyr`
to group things and apply my
`angle`
function to get relative angle. However things are quite slow even for dummy data that I provide here.

``````set.seed(12345)

x <- replicate(1,c(replicate(1000,rnorm(50,0,0.01))))
y <- replicate(1,c(replicate(1000,rnorm(50,0,0.01))))
z <- replicate(1,c(replicate(1000,rnorm(50,0.9,0.01))))
ref_vector <- data.frame(ref_x=rep(0,100),ref_y=rep(0,100),ref_z=rep(1,100))
set <- rep(seq(1,1000),each=50)

data_rep <- data.frame(x,y,z,ref_vector,set)
``````

>

``````head(data_rep)
#           x            y         z ref_x ref_y ref_z set
#    1  0.005855288 -0.015472796 0.9059337     0     0     1   1
#    2  0.007094660 -0.013354359 0.9040137     0     0     1   1
#    3 -0.001093033 -0.014661486 0.9047502     0     0     1   1
#    4 -0.004534972 -0.002764655 0.9070553     0     0     1   1
#    5  0.006058875 -0.008339952 0.8926551     0     0     1   1
#    6 -0.018179560 -0.008412400 0.9055541     0     0     1   1
``````

I define the angle between two vectors with this
`angle`
function,

``````angle <- function(x,y){
dot.prod <- x%*%y
norm.x <- norm(x,type="2")
norm.y <- norm(y,type="2")
theta <- acos(dot.prod / (norm.x * norm.y))
as.numeric(theta)
}
``````

then lets apply this to our
`data_rep`

``````library(dplyr)
system.time(df_angle <- data_rep%>%
rowwise()%>%
group_by(set)%>%

#     user  system elapsed
#      64.22    0.08   64.81
#    Warning message:
#    Grouping rowwise data frame strips rowwise nature
``````

As you can see, the process took around 1 min and I even did not provide all my real data set which has 350000 rows and it takes 10 min to calculate the relative angle.

I wonder is there any way to speed up this process.

Thanks!

Just make a simple `mutate`statement instead of your `do(data.frame())` part. This improves the performance quite a bit, because you no longer have to convert each row into a `data.frame`

``````system.time(df_angle2 <- data_rep%>%
rowwise() %>%
mutate(angle_rad=angle(x = c(x,y,z),y = c(ref_x,ref_y,ref_z))) %>%
group_by(set)%>%