nikosdi nikosdi - 2 months ago 5
R Question

data.frame Group By column

I have a data frame DF.

Say DF is:

A B
1 1 2
2 1 3
3 2 3
4 3 5
5 3 6


Now I want to combine together the rows by the column A and to have the sum of the column B.

For example:

A B
1 1 5
2 2 3
3 3 11


I am doing this currently using an SQL query with the sqldf function. But for some reason it is very slow. Is there any more convenient way to do that? I could do it manually too using a for loop but it is again slow. My SQL query is " Select A,Count(B) from DF group by A".

In general whenever I don't use vectorized operations and I use for loops the performance is extremely slow even for single procedures.

Answer

This is a common question. In base, the option you're looking for is aggregate. Assuming your data.frame is called "mydf", you can use the following.

> aggregate(B ~ A, mydf, sum)
  A  B
1 1  5
2 2  3
3 3 11

I would also recommend looking into the "data.table" package.

> library(data.table)
> DT <- data.table(mydf)
> DT[, sum(B), by = A]
   A V1
1: 1  5
2: 2  3
3: 3 11