In R, I have a reasonably large data frame (d) which is 10500 by 6000. All values are numeric.
It has many na value elements in both its rows and columns, and I am looking to replace these values with a zero. I have used:
d[is.na(d)] <- 0
I guess that all columns must be numeric or assigning 0s to NAs wouldn't be sensible.
I get the following timings, with approximately 10,000 NAs:
> M <- matrix(0, 10500, 6000) > set.seed(54321) > r <- sample(1:10500, 10000, replace=TRUE) > c <- sample(1:6000, 10000, replace=TRUE) > M[cbind(r, c)] <- NA > D <- data.frame(M) > sum(is.na(M)) # check  9999 > sum(is.na(D)) # check  9999 > system.time(M[is.na(M)] <- 0) user system elapsed 0.19 0.12 0.31 > system.time(D[is.na(D)] <- 0) user system elapsed 3.87 0.06 3.95
So, with this number of NAs, I get about an order of magnitude speedup by using a matrix. (With fewer NAs, the difference is smaller.) But the time using a data frame is just 4 seconds on my modest laptop -- much less time than it took to answer the question. If the problem really is of this magnitude, why is that slow?
I hope this helps.