useR - 2 months ago 5x
R Question

Argument "partial" of the sort function in R

`?sort`
states that the
`partial`
argument may be
`NULL`
or a vector of indices for partial sorting.

I tried:

``````x <- c(1,3,5,2,4,6,7,9,8,10)
sort(x)
## [1]  1  2  3  4  5  6  7  8  9 10
sort(x, partial=5)
## [1]  1  3  4  2  5  6  7  9  8 10
sort(x, partial=2)
## [1]  1  2  5  3  4  6  7  9  8 10
sort(x, partial=4)
## [1]  1  2  3  4  5  6  7  9  8 10
``````

I am not sure what
`partial`
means when sorting a vector.

As `?sort` states,

If partial is not NULL, it is taken to contain indices of elements of the result which are to be placed in their correct positions in the sorted array by partial sorting.

In other words, the following assertion is always true:

`````` stopifnot(sort(x, partial=pt_idx)[pt_idx] == sort(x)[pt_idx])
``````

for any `x` and `pt_idx`, e.g.

`````` x <- sample(100) # input vector
pt_idx <- sample(1:100, 5) # indices for partial arg
``````

This behavior is different from the one defined in the Wikipedia article on partial sorting. In R `sort()`'s case we are not necessarily computing k smallest elements.

For example, if

``````print(x)
##  [1]  91  85  63  80  71  69  20  39  78  67  32  56  27  79   9  66  88  23  61  75  68  81  21  90  36  84  11   3  42  43
##  [31]  17  97  57  76  55  62  24  82  28  72  25  60  14  93   2 100  98  51  29   5  59  87  44  37  16  34  48   4  49  77
##  [61]  13  95  31  15  70  18  52  58  73   1  45  40   8  30  89  99  41   7  94  47  96  12  35  19  38   6  74  50  86  65
##  [91]  54  46  33  22  26  92  53  10  64  83
``````

and

``````pt_idx
## [1]  5 54 58 95  8
``````

then

``````sort(x, partial=pt_idx)
##  [1]   1   3   2   4   5   6   7   8  11  12   9  10  13  15  14  16  17  18  23  30  31  27  21  32  36  34  35  19  20  37
## [31]  38  33  29  22  26  25  24  28  39  41  40  42  43  48  46  44  45  47  51  50  52  49  53  54  57  56  55  58  59  60
## [61]  62  64  63  61  65  66  70  72  73  69  68  71  67  79  78  82  75  81  80  77  76  74  89  85  88  87  83  84  86  90
## [91]  92  93  91  94  95  96  97  99 100  98
``````

Here `x[5]`, `x[54]`, ..., `x[8]` are placed in their correct positions - and we cannot say anything else about the remaining elements. HTH.

EDIT: Partial sorting may reduce the sorting time, of course if you are interested in e.g. finding only some of the order statistics.

``````require(microbenchmark)
x <- rnorm(100000)
microbenchmark(sort(x, partial=1:10)[1:10], sort(x)[1:10])
## Unit: milliseconds
##                           expr       min        lq    median        uq      max neval
##  sort(x, partial = 1:10)[1:10]  2.342806  2.366383  2.393426  3.631734 44.00128   100
##                  sort(x)[1:10] 16.556525 16.645339 16.745489 17.911789 18.13621   100
``````
Source (Stackoverflow)