seulberg1 seulberg1 - 1 month ago 6
R Question

R efficient way to sort a Matrix by row

I have a matrix "multiOrderPairsFlat" of 2m+ rows and 2 columns where each cell contains a SKU description (e.g. "Pipe2mSteel" or "Bushing1inS") and would like to sort every row alphabetically, so that in every row, e.g. "Bushings1inS" is in the first column and "Pipe2mSteel" in the second.

However, if I run:

for (i in 1:length(multiOrderPairsFlat)){
multiOrderPairsFlat[i,] <- sort(multiOrderPairsFlat[i,])
}


It takes forever and I doubt this is the quickest way of dealing with this problem. Do you have any advice on how to solve this more efficiently, e.g. by vectorizing the operation?

Thanks for helping out;)
Best
seulberg1

Answer

It may be better to use pmin/pmax after converting to data.frame (as there are only two columns)

 system.time({
 df1 <- as.data.frame(multiOrderPairsFlat, stringsAsFactors=FALSE)
  res <- data.frame(First = do.call(pmin, df1), Second = do.call(pmax, df1))

 })
 #    user  system elapsed 
 #  0.49    0.02    0.50 

system.time({
  for (i in 1:nrow(multiOrderPairsFlat)){
    multiOrderPairsFlat[i,] <- sort(multiOrderPairsFlat[i,])
  }
 })

#  user  system elapsed 
#  11.99    0.00   12.00 

all.equal(as.matrix(res), multiOrderPairsFlat, check.attributes=FALSE)
#[1] TRUE

Checking the memory allocation

library(profvis)

profvis({
 df1 <- as.data.frame(multiOrderPairsFlat, stringsAsFactors=FALSE)
 res <- data.frame(First = do.call(pmin, df1), Second = do.call(pmax, df1))

  })

#3.3 MB
profvis({
 for (i in 1:nrow(multiOrderPairsFlat)){
  multiOrderPairsFlat[i,] <- sort(multiOrderPairsFlat[i,])
  }
})

#12.8 MB

data

set.seed(24)
multiOrderPairsFlat <- cbind(sample(c("Pipe2mSteel" , "Bushing1inS"), 1e6, replace=TRUE),
    sample(c("Pipe2mSteel" , "Bushing1inS"), 1e6, replace=TRUE))
Comments