Akiru - 1 year ago 90
R Question

# split a vector by percentile

I need to split a sorted unknown length vector in R into "top 10%,..., bottom 10%"
So, for example if I have

`vector <- order(c(1:98928))`
, I want to split it into 10 different vectors, each one representing approximately 10% of the total length.

Ive tried using
`split <- split(vector, 1:10)`
but as I dont know the length of the vector, I get this error if its not multiple

data length is not a multiple of split variable

And even if its multiple and the function works,
`split()`
does not keep the order of my original vector. This is what split gives:

``````split(c(1:10) , 1:2)
\$`1`
[1] 1 3 5 7 9

\$`2`
[1]  2  4  6  8 10
``````

And this is what I want:

``````\$`1`
[1] 1 2 3 4 5

\$`2`
[1]  6  7  8  9 10
``````

Im newbie in R and Ive been trying lots of things without success, does anyone knows how to do this?

Answer Source

# Problem statement

Break a sorted vector `x` every 10% into 10 chunks.

Note there are two interpretation for this:

1. Cutting by vector index. In this case, we have solution:

``````split(x, floor(10 * seq.int(0, length(x)-1, 1) / length(x)))
``````
2. Cutting by vector values. In this case, we have to split `x` by sample quantiles. We have solution:

``````split(x, cut(x, quantile(x, prob = 0:10 / 10, names = FALSE),
include.lowest = TRUE))
``````

In the following, I will make demonstration using data:

``````set.seed(0); x <- round(rnorm(23),1)
``````

Particularly, our example data are Normally distributed rather than uniformly distributed, so cutting by index and cutting by value are substantially different.

# Result

cutting by index

``````#\$`0`
#[1]  1.3 -0.3  1.3

#\$`1`
#[1] 1.3 0.4

#\$`2`
#[1] -1.5 -0.9

#\$`3`
#[1] -0.3  0.0  2.4

#\$`4`
#[1]  0.8 -0.8

#\$`5`
#[1] -1.1 -0.3

#\$`6`
#[1] -0.3 -0.4  0.3

#\$`7`
#[1] -0.9  0.4

#\$`8`
#[1] -1.2 -0.2

#\$`9`
#[1] 0.4 0.1
``````

cutting by value

``````\$`[-1.5,-1.06]`
[1] -1.5 -1.1 -1.2

\$`(-1.06,-0.86]`
[1] -0.9 -0.9

\$`(-0.86,-0.34]`
[1] -0.8 -0.4

\$`(-0.34,-0.3]`
[1] -0.3 -0.3 -0.3 -0.3

\$`(-0.3,-0.2]`
[1] -0.2

\$`(-0.2,0.14]`
[1] 0.0 0.1

\$`(0.14,0.4]`
[1] 0.4 0.3 0.4 0.4

\$`(0.4,0.64]`
numeric(0)

\$`(0.64,1.3]`
[1] 1.3 1.3 1.3 0.8

\$`(1.3,2.4]`
[1] 2.4
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download