ds_user ds_user - 10 months ago 41
R Question

Count on each group matching the criteria R

I have the below dataset, I am trying to find the number of days each machine was active and in sleep mode based on SwitchedOnDate.

MachineID InstalledDate SwitchedOnDate Status
1 2010-02-18 2010-02-19 SleepMode
1 2010-02-18 2010-02-20 Active
1 2010-02-18 2010-02-21 SleepMode
1 2010-02-18 2010-02-22 Active
2 2010-02-20 2010-02-21 Active
2 2010-02-20 2010-02-22 SleepMode
3 2010-02-10 2010-02-18 SleepMode
4 2010-03-10 2010-03-15 Active


So my output should look something like this,

MachineID SleepModeDays ActiveDays
1 2 2
2 1 1
3 1 0
4 0 1

Answer Source

You can use count to group and count occurrences of each level of the last variable you pass it, and then spread to rearrange the data to wide form:

library(tidyverse)

df %>% count(MachineID, Status) %>% spread(Status, n, fill = 0L)

## Source: local data frame [4 x 3]
## Groups: MachineID [4]
## 
##   MachineID Active SleepMode
## *     <int>  <int>     <int>
## 1         1      2         2
## 2         2      1         1
## 3         3      0         1
## 4         4      1         0

A base R alternative:

tab <- table(df$MachineID, df$Status)

data.frame(MachineID = rownames(tab), as.data.frame.matrix(tab))

##   MachineID Active SleepMode
## 1         1      2         2
## 2         2      1         1
## 3         3      0         1
## 4         4      1         0

or for a very direct route,

janitor::crosstab(df, MachineID, Status)

##   MachineID Active SleepMode
## 1         1      2         2
## 2         2      1         1
## 3         3      0         1
## 4         4      1         0

Note these approachs makes a few assumptions, e.g. that you can't have multiple occurrences of the same combination of MachineID, SwitchedOnDate, and Status. If your data gets more complicated, account for such.