Kartheek Palepu Kartheek Palepu - 1 year ago 60
R Question

Optimizing the code for error minimization

I have written the code below for minimization of error by changing the value of alpha (using iteration method).

npoints = 10000
Y = round(runif(npoints), 3)
OY = sample(c(0, 1, 0.5), npoints, replace = T)

minimizeAlpha = function(Y, OY, alpha) {
PY = alpha*Y
error = OY - PY
squaredError = sapply(error, function(x) x*x)
sse = sum(squaredError)
# # Iterate for 10000 values
alphas = seq(0.0001, 1, 0.0001)
sse = sapply(alphas, function(x) minimizeAlpha(Y, OY, x))
print(alphas[sse == min(sse)])

I have used
for basic optimization. But, if the number of points are more than 10000 this code is running forever. So, is there any better way of implementation or any standard techniques to optimize (like
). If so can you please help me in optimizing the code.

Note: I need the value of alpha with at least 4 decimals.

Any help is appreciated.

Answer Source

Replacing sapply instead of for isn’t more efficient, that’s a misconception. It’s merely often simpler code.

However, you can actually take advantage of vectorisation in your code — and that would be faster.

For instance, sapply(error, function(x) x*x) can simply be replaced by x * x. The sum of squared errors of numbers in R is thus simply sum((OY - PY) ** 2).

Your whole function thus boils down to:

minimizeAlpha = function(Y, OY, alpha)
    sum((OY - alpha * Y) ** 2)

This should be more efficient — but first and foremost it’s better code and more readable.