javivr javivr - 3 months ago 9
R Question

Subsetting a table of two variables to remove the 0 values

I have a table such as the following sample, which is obtained from a df (0==control; 1==case; heading figures are strata):

208 209 210 211 212 213
0 4 16 3 5 2 0
1 0 7 2 0 6 2


I need to create a new df removing those strata with 0 cases (1) or controls (0).

So far, I created the following code which creates a vector with logicals:

table(df$status, df$strata)>0


but havenĀ“t managed to go further.

Answer

We can use subset

subset(df, strata %in% dimnames(tbl)[[2]][colSums(tbl==0)==0])
#  status strata
#1       0    211
#5       1    209
#7       0    209
#8       1    208
#9       1    211
#10      0    208

I think the question is not about checking whether 'df' is equal to 0. Infact, the OP wants to subset the dataset based on the frequency.


A compact option would be to use data.table

library(data.table)
setDT(df)[, if(uniqueN(status)>1) .SD , by = .(strata)]
#    strata status
#1:    211      0
#2:    211      1
#3:    209      1
#4:    209      0
#5:    208      1
#6:    208      0

i.e. here we are converting the 'data.frame' to 'data.table' (setDT(df)), grouped by 'strata', if the length of the unique elements in 'status' is greater than 1 (in this case 2), we get the Subset of Data.table (.SD).


An option using the similar logic in dplyr is

library(dplyr)
df %>%
   group_by(strata) %>%
   filter(n_distinct(status)>1)

data

set.seed(24)
df <- data.frame(status = sample(0:1, 10, replace=TRUE), 
           strata = sample(208:213, 10, replace = TRUE))

tbl <- table(df)
Comments