Akiru Akiru - 2 months ago 13
R Question

split a vector by percentile

I need to split a sorted unknown length vector in R into "top 10%,..., bottom 10%"
So, for example if I have

vector <- order(c(1:98928))
, I want to split it into 10 different vectors, each one representing approximately 10% of the total length.

Ive tried using
split <- split(vector, 1:10)
but as I dont know the length of the vector, I get this error if its not multiple


data length is not a multiple of split variable


And even if its multiple and the function works,
split()
does not keep the order of my original vector. This is what split gives:

split(c(1:10) , 1:2)
$`1`
[1] 1 3 5 7 9

$`2`
[1] 2 4 6 8 10


And this is what I want:

$`1`
[1] 1 2 3 4 5

$`2`
[1] 6 7 8 9 10


Im newbie in R and Ive been trying lots of things without success, does anyone knows how to do this?

Answer

Problem statement

Break a sorted vector x every 10% into 10 chunks.

Note there are two interpretation for this:

  1. Cutting by vector index. In this case, we have solution:

    split(x, floor(10 * seq.int(0, length(x)-1, 1) / length(x)))
    
  2. Cutting by vector values. In this case, we have to split x by sample quantiles. We have solution:

    split(x, cut(x, quantile(x, prob = 0:10 / 10, names = FALSE),
                 include.lowest = TRUE))
    

In the following, I will make demonstration using data:

set.seed(0); x <- round(rnorm(23),1)

Particularly, our example data are Normally distributed rather than uniformly distributed, so cutting by index and cutting by value are substantially different.

Result

cutting by index

#$`0`
#[1]  1.3 -0.3  1.3

#$`1`
#[1] 1.3 0.4

#$`2`
#[1] -1.5 -0.9

#$`3`
#[1] -0.3  0.0  2.4

#$`4`
#[1]  0.8 -0.8

#$`5`
#[1] -1.1 -0.3

#$`6`
#[1] -0.3 -0.4  0.3

#$`7`
#[1] -0.9  0.4

#$`8`
#[1] -1.2 -0.2

#$`9`
#[1] 0.4 0.1

cutting by value

$`[-1.5,-1.06]`
[1] -1.5 -1.1 -1.2

$`(-1.06,-0.86]`
[1] -0.9 -0.9

$`(-0.86,-0.34]`
[1] -0.8 -0.4

$`(-0.34,-0.3]`
[1] -0.3 -0.3 -0.3 -0.3

$`(-0.3,-0.2]`
[1] -0.2

$`(-0.2,0.14]`
[1] 0.0 0.1

$`(0.14,0.4]`
[1] 0.4 0.3 0.4 0.4

$`(0.4,0.64]`
numeric(0)

$`(0.64,1.3]`
[1] 1.3 1.3 1.3 0.8

$`(1.3,2.4]`
[1] 2.4
Comments