user1 - 1 year ago 64
R Question

# Sorting data frame by column, adding index within group

This question describes the setting for my question pretty well.

Instead of a second value however, I have a factor called

`algorithm`
. My data frame looks like the following (note the possibility of multiplicity of values even within their group):

``````algorithm <- c("global", "distributed", "distributed", "none", "global", "global", "distributed", "none", "none")
v <- c(5, 2, 6, 7, 3, 1, 10, 2, 2)
df <- data.frame(algorithm, v)
df
algorithm  v
1      global  5
2 distributed  2
3 distributed  6
4        none  7
5      global  3
6      global  1
7 distributed 10
8        none  2
9        none  2
``````

I would like to sort the dataframe by
`v`
but get the ordering position for every entry with respect to its group (algorithm). This position should then be added to the original data frame (so I don't need to rearrange it) because I would like to plot the calculated position as x and the value as y using a ggplot (grouped by algorithm, e.g. every algorithm is one set of points).

So the result should look like this:

``````    algorithm  v  groupIndex
1      global  5  3
2 distributed  2  1
3 distributed  6  2
4        none  7  3
5      global  3  2
6      global  1  1
7 distributed 10  3
8        none  2  1
9        none  2  2
``````

So far I know I can order the data by algorithm first and then by value or the other way round. I guess in a second step I would have to calculate the index within each group? Is there an easy way to do that?

``````df[order(df\$algorithm, df\$v), ]
algorithm  v
2 distributed  2
3 distributed  6
7 distributed 10
6      global  1
5      global  3
1      global  5
8        none  2
9        none  2
4        none  7
``````

Edit: It is not guaranteed, that there is the same amount of entries for each group!

A double application of `order` in each group should cover it:

``````ave(df\$v, df\$algorithm, FUN=function(x) order(order(x)) )
#[1] 3 1 2 3 2 1 3 1 2
``````

Which is also equivalent to:

``````ave(df\$v, df\$algorithm, FUN=function(x) rank(x,ties.method="first") )
#[1] 3 1 2 3 2 1 3 1 2
``````

, which in turn means you can take advantage of `frank` from `data.table` if you are concerned about speed:

``````setDT(df)[, grpidx := frank(v,ties.method="first"), by=algorithm]
df
#     algorithm  v grpidx
#1:      global  5      3
#2: distributed  2      1
#3: distributed  6      2
#4:        none  7      3
#5:      global  3      2
#6:      global  1      1
#7: distributed 10      3
#8:        none  2      1
#9:        none  2      2
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download