ToNoY - 2 months ago 8

R Question

I was trying to do an extensive computation in R. Eighteen hours have passed but my RStudio seems to continue to work. I'm not sure if I could have written the script in a different way to make it faster. I was trying to implement a CrankâNicolson type method over a 50000 by 350 matrix as shown below:

`#defining the discretization of cells`

dt<-1

t<-50000

dz<-0.0075

z<-350*dz

#velocity & diffusion

v<-2/(24*60*60)

D<-0.02475/(24*60*60)

#make the big matrix (all filled with zeros)

m <- as.data.frame(matrix(0, t/dt+1, z/dz+2)) #extra columns/rows for boundary conditions

#fill the first and last columns with constant boundary values

m[,1]<-400

m[,length(m)]<-0

#implement the calculation

for(j in 2:(length(m[1,])-1)){

for(i in 2:length(m[[1]])){

m[i,][2:length(m)-1][[j]]<-m[i-1,][[j]]+

D*dt*(m[i-1,][[j+1]]-2*m[i-1,][[j]]+m[i-1,][[j-1]])/(dz^2)-

v*dt*(m[i-1,][[j+1]]-m[i-1,][[j-1]])/(2*dz)

}}

Is there a way to know how long would it take for R to implement it? Is there a better way of constructing the numerical calculation? At this point, I feel like excel could have been faster!!

Answer

Just making a few simple optimisations really helps here. The original version code of your code would take ~ 5 days on my laptop. Using a matrix and calculating just once values that are reused in the loop, we bring this down to around 7 minutes

And think about messy constructions like

```
m[i,][2:NC-1][[j]]
```

This is equivalent to

```
m[[i, j]]
```

which would be faster (as well as much easier to understand). Making this change further reduces the runtime by another factor of over 2, to around 3 minutes

Putting this together we have

```
dt<-1
t<-50000
dz<-0.0075
z<-350*dz
#velocity & diffusion
v<-2/(24*60*60)
D<-0.02475/(24*60*60)
#make the big matrix (all filled with zeros)
m <- (matrix(0, t/dt+1, z/dz+2)) #extra columns/rows for boundary conditions
# cache a few values that get reused many times
NC = NCOL(m)
NR = NROW(m)
C1 = D*dt / dz^2
C2 = v*dt / (2*dz)
#fill the first and last columns with constant boundary values
m[,1]<-400
m[,NC]<-0
#implement the calculation
for(j in 2:(NC-1)){
for(i in 2:NR){
ma = m[i-1,]
ma.1 = ma[[j+1]]
ma.2 = ma[[j-1]]
m[[i,j]] <- ma[[j]] + C1*(ma.1 - 2*ma[[j]] + ma.2) - C2*(ma.1 - ma.2)
}
}
```

If you need to go even faster than this, you can try out some more optimisations. For example see here for how different ways of indexing the same element can have very different execution times. In general it is better to refer to column first, then row.

If all the optimisations you can do in R are not enough for your speed requirements, then you might implement the loop in RCpp instead.