seulberg1 - 1 year ago 71
R Question

# R efficient way to sort a Matrix by row

I have a matrix "multiOrderPairsFlat" of 2m+ rows and 2 columns where each cell contains a SKU description (e.g. "Pipe2mSteel" or "Bushing1inS") and would like to sort every row alphabetically, so that in every row, e.g. "Bushings1inS" is in the first column and "Pipe2mSteel" in the second.

However, if I run:

``````for (i in 1:length(multiOrderPairsFlat)){
multiOrderPairsFlat[i,] <- sort(multiOrderPairsFlat[i,])
}
``````

It takes forever and I doubt this is the quickest way of dealing with this problem. Do you have any advice on how to solve this more efficiently, e.g. by vectorizing the operation?

Thanks for helping out;)
Best
seulberg1

It may be better to use `pmin/pmax` after converting to `data.frame` (as there are only two columns)

`````` system.time({
df1 <- as.data.frame(multiOrderPairsFlat, stringsAsFactors=FALSE)
res <- data.frame(First = do.call(pmin, df1), Second = do.call(pmax, df1))

})
#    user  system elapsed
#  0.49    0.02    0.50

system.time({
for (i in 1:nrow(multiOrderPairsFlat)){
multiOrderPairsFlat[i,] <- sort(multiOrderPairsFlat[i,])
}
})

#  user  system elapsed
#  11.99    0.00   12.00

all.equal(as.matrix(res), multiOrderPairsFlat, check.attributes=FALSE)
#[1] TRUE
``````

Checking the memory allocation

``````library(profvis)

profvis({
df1 <- as.data.frame(multiOrderPairsFlat, stringsAsFactors=FALSE)
res <- data.frame(First = do.call(pmin, df1), Second = do.call(pmax, df1))

})

#3.3 MB
profvis({
for (i in 1:nrow(multiOrderPairsFlat)){
multiOrderPairsFlat[i,] <- sort(multiOrderPairsFlat[i,])
}
})

#12.8 MB
``````

### data

``````set.seed(24)
multiOrderPairsFlat <- cbind(sample(c("Pipe2mSteel" , "Bushing1inS"), 1e6, replace=TRUE),
sample(c("Pipe2mSteel" , "Bushing1inS"), 1e6, replace=TRUE))
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download