user1 - 3 months ago 13

R Question

This question describes the setting for my question pretty well.

Instead of a second value however, I have a factor called

`algorithm`

`algorithm <- c("global", "distributed", "distributed", "none", "global", "global", "distributed", "none", "none")`

v <- c(5, 2, 6, 7, 3, 1, 10, 2, 2)

df <- data.frame(algorithm, v)

df

algorithm v

1 global 5

2 distributed 2

3 distributed 6

4 none 7

5 global 3

6 global 1

7 distributed 10

8 none 2

9 none 2

I would like to sort the dataframe by

`v`

So the result should look like this:

`algorithm v groupIndex`

1 global 5 3

2 distributed 2 1

3 distributed 6 2

4 none 7 3

5 global 3 2

6 global 1 1

7 distributed 10 3

8 none 2 1

9 none 2 2

So far I know I can order the data by algorithm first and then by value or the other way round. I guess in a second step I would have to calculate the index within each group? Is there an easy way to do that?

`df[order(df$algorithm, df$v), ]`

algorithm v

2 distributed 2

3 distributed 6

7 distributed 10

6 global 1

5 global 3

1 global 5

8 none 2

9 none 2

4 none 7

Answer

A double application of `order`

in each group should cover it:

```
ave(df$v, df$algorithm, FUN=function(x) order(order(x)) )
#[1] 3 1 2 3 2 1 3 1 2
```

Which is also equivalent to:

```
ave(df$v, df$algorithm, FUN=function(x) rank(x,ties.method="first") )
#[1] 3 1 2 3 2 1 3 1 2
```

, which in turn means you can take advantage of `frank`

from `data.table`

if you are concerned about speed:

```
setDT(df)[, grpidx := frank(v,ties.method="first"), by=algorithm]
df
# algorithm v grpidx
#1: global 5 3
#2: distributed 2 1
#3: distributed 6 2
#4: none 7 3
#5: global 3 2
#6: global 1 1
#7: distributed 10 3
#8: none 2 1
#9: none 2 2
```