Richard Herron - 6 months ago 23

R Question

I see a lot of questions and answers re

`order`

`sort`

Here's my attempt:

`temp <- data.frame(name=letters[1:12], value=rnorm(12), quartile=rep(NA, 12))`

temp

# name value quartile

# 1 a 2.55118169 NA

# 2 b 0.79755259 NA

# 3 c 0.16918905 NA

# 4 d 1.73359245 NA

# 5 e 0.41027113 NA

# 6 f 0.73012966 NA

# 7 g -1.35901658 NA

# 8 h -0.80591167 NA

# 9 i 0.48966739 NA

# 10 j 0.88856758 NA

# 11 k 0.05146856 NA

# 12 l -0.12310229 NA

temp.sorted <- temp[order(temp$value), ]

temp.sorted$quartile <- rep(1:4, each=12/4)

temp <- temp.sorted[order(as.numeric(rownames(temp.sorted))), ]

temp

# name value quartile

# 1 a 2.55118169 4

# 2 b 0.79755259 3

# 3 c 0.16918905 2

# 4 d 1.73359245 4

# 5 e 0.41027113 2

# 6 f 0.73012966 3

# 7 g -1.35901658 1

# 8 h -0.80591167 1

# 9 i 0.48966739 3

# 10 j 0.88856758 4

# 11 k 0.05146856 2

# 12 l -0.12310229 1

Is there a better (cleaner/faster/one-line) approach? Thanks!

Answer

The method I use is one of these or `Hmisc::cut2(value, g=4)`

:

```
temp$quartile <- with(temp, cut(value,
breaks=quantile(value, probs=seq(0,1, by=0.25), na.rm=TRUE),
include.lowest=TRUE))
```

An alternate might be:

```
temp$quartile <- with(temp, factor(
findInterval( val, c(-Inf,
quantile(val, probs=c(0.25, .5, .75)), Inf) , na.rm=TRUE),
labels=c("Q1","Q2","Q3","Q4")
))
```

The first one has the side-effect of labeling the quartiles with the values, which I consider a "good thing", but if it were not "good for you", or the valid problems raised in the comments were a concern you could go with version 2. You can use `labels=`

in `cut`

, or you could add this line to your code:

```
temp$quartile <- factor(temp$quartile, levels=c("1","2","3","4") )
```

Or even quicker but slightly more obscure in how it works, although it is no longer a factor, but rather a numeric vector:

```
temp$quartile <- as.numeric(temp$quartile)
```