Rookie - 7 months ago 48

R Question

I have a data-frame likeso:

`x <-`

id1 id2 val1 val2 val3 val4

1 a x 1 9

2 a x 2 4

3 a y 3 5

4 a y 4 9

5 b x 1 7

6 b y 4 4

7 b x 3 9

8 b y 2 8

I wish to aggregate the above by id1 & id2. I want to be able to get the means for val1, val2, val3, val4 at the same time.

How do i do this?

This is what i currently have but it works just for 1 column:

`agg <- aggregate(x$val1, list(id11 = x$id1, id2= x$id2), mean)`

names(agg)[3] <- c("val1") # Rename the column

Also, how do i rename the columns which are outputted as means in the same statement given above

Answer

We can use the formula method of `aggregate`

. The variables on the 'rhs' of `~`

are the grouping variables while the `.`

represents all other variables in the 'df1' (from the example, we assume that we need the `mean`

for all the columns except the grouping), specify the dataset and the function (`mean`

).

```
aggregate(.~id1+id2, df1, mean)
```

Or we can use `summarise_each`

from `dplyr`

after grouping (`group_by`

)

```
library(dplyr)
df1 %>%
group_by(id1, id2) %>%
summarise_each(funs(mean))
```

Or another option is `data.table`

. We convert the 'data.frame' to 'data.table' (`setDT(df1)`

, grouped by 'id1' and 'id2', we loop through the subset of data.table (`.SD`

) and get the `mean`

.

```
library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)]
```

```
df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b",
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"),
val1 = c(1L,
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L,
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
```