Milhouse Milhouse - 3 months ago 21
R Question

Creating variable by group (R)

My data looks something like this:

CPUBID MPUBID CSEX CMOB CYRB twin twinfam
<int> <int> <int> <int> <int> <int> <int>
1 201 2 2 3 1993 0 0
2 202 2 2 11 1994 0 0
3 301 3 2 10 1983 1 1
4 302 3 2 10 1983 1 1
5 303 3 2 4 1986 0 1
6 401 4 1 8 1980 0 0
7 403 4 2 3 1997 0 0
8 801 8 2 3 1976 0 0
9 802 8 1 5 1979 0 0
10 803 8 2 9 1982 0 0


dput()
version:

structure(list(CPUBID = c(201L, 202L, 301L, 302L, 303L, 401L,
403L, 801L, 802L, 803L), MPUBID = c(2L, 2L, 3L, 3L, 3L, 4L, 4L,
8L, 8L, 8L), CSEX = c(2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L),
CMOB = c(3L, 11L, 10L, 10L, 4L, 8L, 3L, 3L, 5L, 9L), CYRB = c(1993L,
1994L, 1983L, 1983L, 1986L, 1980L, 1997L, 1976L, 1979L, 1982L
), twin = c(0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), twinfam = c(0L,
0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L)), .Names = c("CPUBID",
"MPUBID", "CSEX", "CMOB", "CYRB", "twin", "twinfam"), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))


CPUBID is individual ID, MPUBID is mother's ID, CSEX = sex, CMOB = month of birth etc. Twin is a binary variable indicating that the individual is a twin. "twinfam" is the variable I'm trying to create. i.e. if any member of the household is a twin, this binary indicator takes value == 1 for all members of that household.

I tried using:

df <- df %>% group_by(MPUBID) %>%
mutate(twinfam = as.numeric(count(twin == 1) > 0))


but this gives me the error:

Error: no applicable method for 'group_by_' applied to an object of class "logical"


Any suggestions for a way to fix this, or perhaps a better route to creating the desired variable? Thanks.

Answer

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'MPUBID', we check whether there are any value in 'twin' that is not a 0 and convert that logical vector to binary (as.integer)

library(data.table)
setDT(df1)[, twinfam1 := as.integer(any(twin!=0)) , by = MPUBID]

Or using dplyr with the same logic.

library(dplyr)
df1 %>%
  group_by(MPUBID) %>%
  mutate(twinfam = as.integer(any(twin!=0)))
#  CPUBID MPUBID  CSEX  CMOB  CYRB  twin twinfam
#    <int>  <int> <int> <int> <int> <int>   <int>
#1     201      2     2     3  1993     0       0
#2     202      2     2    11  1994     0       0
#3     301      3     2    10  1983     1       1
#4     302      3     2    10  1983     1       1
#5     303      3     2     4  1986     0       1
#6     401      4     1     8  1980     0       0
#7     403      4     2     3  1997     0       0
#8     801      8     2     3  1976     0       0
#9     802      8     1     5  1979     0       0
#10    803      8     2     9  1982     0       0