Alexander - 8 months ago 54

R Question

I m just trying to calculate the relative angle between with my x,y,z data frame to the reference vector. So far, I use

`dplyr`

`angle`

`set.seed(12345)`

x <- replicate(1,c(replicate(1000,rnorm(50,0,0.01))))

y <- replicate(1,c(replicate(1000,rnorm(50,0,0.01))))

z <- replicate(1,c(replicate(1000,rnorm(50,0.9,0.01))))

ref_vector <- data.frame(ref_x=rep(0,100),ref_y=rep(0,100),ref_z=rep(1,100))

set <- rep(seq(1,1000),each=50)

data_rep <- data.frame(x,y,z,ref_vector,set)

>

`head(data_rep)`

# x y z ref_x ref_y ref_z set

# 1 0.005855288 -0.015472796 0.9059337 0 0 1 1

# 2 0.007094660 -0.013354359 0.9040137 0 0 1 1

# 3 -0.001093033 -0.014661486 0.9047502 0 0 1 1

# 4 -0.004534972 -0.002764655 0.9070553 0 0 1 1

# 5 0.006058875 -0.008339952 0.8926551 0 0 1 1

# 6 -0.018179560 -0.008412400 0.9055541 0 0 1 1

I define the angle between two vectors with this

`angle`

`angle <- function(x,y){`

dot.prod <- x%*%y

norm.x <- norm(x,type="2")

norm.y <- norm(y,type="2")

theta <- acos(dot.prod / (norm.x * norm.y))

as.numeric(theta)

}

then lets apply this to our

`data_rep`

`library(dplyr)`

system.time(df_angle <- data_rep%>%

rowwise()%>%

do(data.frame(.,angle_rad=angle(unlist(.[1:3]),unlist(.[4:6]))))%>%

group_by(set)%>%

mutate(angle=angle_rad*180/pi, mean_angle=mean(angle)))

# user system elapsed

# 64.22 0.08 64.81

# Warning message:

# Grouping rowwise data frame strips rowwise nature

As you can see, the process took around 1 min and I even did not provide all my real data set which has 350000 rows and it takes 10 min to calculate the relative angle.

I wonder is there any way to speed up this process.

Thanks!

Answer

Just make a simple `mutate`

statement instead of your `do(data.frame())`

part. This improves the performance quite a bit, because you no longer have to convert each row into a `data.frame`

```
system.time(df_angle2 <- data_rep%>%
rowwise() %>%
mutate(angle_rad=angle(x = c(x,y,z),y = c(ref_x,ref_y,ref_z))) %>%
group_by(set)%>%
mutate(angle=angle_rad*180/pi, mean_angle=mean(angle)))
## user system elapsed
## 3.72 0.00 3.71
all.equal(df_angle,df_angle2)
## TRUE
```

Source (Stackoverflow)