useR useR - 4 months ago 26
R Question

Argument "partial" of the sort function in R

?sort
states that the
partial
argument may be
NULL
or a vector of indices for partial sorting.

I tried:

x <- c(1,3,5,2,4,6,7,9,8,10)
sort(x)
## [1] 1 2 3 4 5 6 7 8 9 10
sort(x, partial=5)
## [1] 1 3 4 2 5 6 7 9 8 10
sort(x, partial=2)
## [1] 1 2 5 3 4 6 7 9 8 10
sort(x, partial=4)
## [1] 1 2 3 4 5 6 7 9 8 10


I am not sure what
partial
means when sorting a vector.

Answer

As ?sort states,

If partial is not NULL, it is taken to contain indices of elements of the result which are to be placed in their correct positions in the sorted array by partial sorting.

In other words, the following assertion is always true:

 stopifnot(sort(x, partial=pt_idx)[pt_idx] == sort(x)[pt_idx])

for any x and pt_idx, e.g.

 x <- sample(100) # input vector
 pt_idx <- sample(1:100, 5) # indices for partial arg

This behavior is different from the one defined in the Wikipedia article on partial sorting. In R sort()'s case we are not necessarily computing k smallest elements.

For example, if

print(x)
##  [1]  91  85  63  80  71  69  20  39  78  67  32  56  27  79   9  66  88  23  61  75  68  81  21  90  36  84  11   3  42  43
##  [31]  17  97  57  76  55  62  24  82  28  72  25  60  14  93   2 100  98  51  29   5  59  87  44  37  16  34  48   4  49  77
##  [61]  13  95  31  15  70  18  52  58  73   1  45  40   8  30  89  99  41   7  94  47  96  12  35  19  38   6  74  50  86  65
##  [91]  54  46  33  22  26  92  53  10  64  83

and

pt_idx
## [1]  5 54 58 95  8

then

sort(x, partial=pt_idx)
##  [1]   1   3   2   4   5   6   7   8  11  12   9  10  13  15  14  16  17  18  23  30  31  27  21  32  36  34  35  19  20  37
## [31]  38  33  29  22  26  25  24  28  39  41  40  42  43  48  46  44  45  47  51  50  52  49  53  54  57  56  55  58  59  60
## [61]  62  64  63  61  65  66  70  72  73  69  68  71  67  79  78  82  75  81  80  77  76  74  89  85  88  87  83  84  86  90
## [91]  92  93  91  94  95  96  97  99 100  98

Here x[5], x[54], ..., x[8] are placed in their correct positions - and we cannot say anything else about the remaining elements. HTH.

EDIT: Partial sorting may reduce the sorting time, of course if you are interested in e.g. finding only some of the order statistics.

require(microbenchmark)
x <- rnorm(100000)
microbenchmark(sort(x, partial=1:10)[1:10], sort(x)[1:10])
## Unit: milliseconds
##                           expr       min        lq    median        uq      max neval
##  sort(x, partial = 1:10)[1:10]  2.342806  2.366383  2.393426  3.631734 44.00128   100
##                  sort(x)[1:10] 16.556525 16.645339 16.745489 17.911789 18.13621   100