Aramis7d Aramis7d - 22 days ago 9
R Question

R grouping over combination of multiple columns

Consdering the input

dsam
as :

structure(list(a = structure(c(3L, 2L, 1L, 3L, 1L, 3L, 1L, 1L, 1L, 1L),
.Label = c("A", "B", "C"), class = "factor"), b = c(1,
1, 1, 1, 1, 3, 2, 3, 3, 1), c = structure(c(2L, 1L, 1L, 2L, 1L,
3L, 1L, 1L, 3L, 3L), .Label = c("D", "E", "F"), class = "factor")),
.Names = c("a", "b", "c"), row.names = c(NA, -10L), class = "data.frame")


I was trying to group over
a
and
c
and aggregate
b
for the groups to keep one record per group. But it seems the following code is behaving differently.
The original data has over 300 columns used for grouping, so it's not an option to explicitly specify the column names, and hence using a list of the column names for grouping.

Method 1:

dsam %>%
group_by(a,c) %>%
mutate(rnk = row_number(), b = sum(b)) %>%
filter( rnk == max(rnk)) %>% print()

#Source: local data frame [5 x 4]
#Groups: a, c [5]
#
# a b c rnk
# <fctr> <dbl> <fctr> <int>
#1 B 1 D 1
#2 C 2 E 2
#3 C 3 F 1
#4 A 7 D 4
#5 A 4 F 2


Method 2:

dsam %>%
group_by_(unlist(c("a","c"))) %>%
mutate(rnk = row_number(), b = sum(b)) %>%
filter( rnk == max(rnk)) %>% print()


#Source: local data frame [3 x 4]
#Groups: a [3]
#
# a b c rnk
# <fctr> <dbl> <fctr> <int>
#1 B 1 D 1
#2 C 5 F 3
#3 A 11 F 6


How can I make Method 2 behave like Method 1?

p.s. Due to the large number of columns used for grouping, I would prefer not to concatenate them together.
Thank you.

Answer

We need .dots

dsam %>% 
     group_by_(.dots = c("a", "c")) %>%
     mutate(rnk = row_number(), b = sum(b)) %>% 
     filter( rnk == max(rnk))
#      a     b      c   rnk
#  <fctr> <dbl> <fctr> <int>
#1      B     1      D     1
#2      C     2      E     2
#3      C     3      F     1
#4      A     7      D     4
#5      A     4      F     2

If we are using without the .dots, it will group only by the first column i.e. 'a'