Lukas Grebe Lukas Grebe - 3 months ago 19
R Question

Aggregate (count) rows that match a condition, group by unique values

It seems like such a simple problem, yet i've been pulling my hair out trying to get this to work:

Given this data frame identifying the interactions

id
had with
contact
who is grouped by
contactGrp
,

head(data)
id sesTs contact contactGrp relpos maxpos
1 6849 2012-06-25 15:58:34 peter west 0.000000 3
2 6849 2012-06-25 18:24:49 sarah south 0.500000 3
3 6849 2012-06-27 00:13:30 sarah south 1.000000 3
4 1235 2012-06-29 17:49:35 peter west 0.000000 2
5 1235 2012-06-29 23:56:35 peter west 1.000000 2
6 5893 2012-06-30 22:21:33 carl east 0.000000 1


how many contacts where there for
unique(data$contactGrp)
with
relpos=1
and
maxpos>1
?

An expected Result would be:

1 west 1
2 south 1
3 east 0


A small subset of lines i have tried:


  • aggregate(data, by=list('contactGrp'), FUN=count)
    yields an error, no filtering

  • using
    data.table
    seems to require a key, which is not unique in this data…

  • ddply(data,"contactGrp",summarise,count=???)
    not sure which function to use to fill the
    count
    column

  • ddply(subset(data,maxpos>1 & relpos==0), c('contactGrp'), function(df)count(df$relpos))
    works but gives me an extra column
    x
    and it feels like i've overcomplicated it…



SQL would be easy:
Select contactGrp, count(*) as cnt from data where … Group by contactGrp
but im trying to learn
R

Answer

I think this is the ddply version you're looking for:

ddply(sessions,.(contactGrp),
      summarise,
      count = length(contact[relpos == 0 & maxpos > 1]))