Fee Fee - 1 month ago 12
R Question

R: subset a dataframe by factor levels with a second condition

I am looking to create subsets of Contact.ID's occurring in certain Terms but not other.

To explain a bit better, this is a snapshot of my dataset:

df <- c(Contact.ID, Date.Time, Age, Gender, Attendance)

Contact.ID Date Time Age Gender Attendance Term
1 A 2012-10-06 18:54:48 37 Male 30 Term1
2 A 2013-03-12 20:50:18 37 Male 30 Term2
3 A 2013-05-24 20:18:44 37 Male 30 Term3
4 B 2012-11-15 16:58:15 27 Female 40 Term1
5 B 2012-12-23 10:57:02 27 Female 40 WB
6 B 2013-01-11 17:31:22 27 Female 40 Term2
7 B 2013-02-18 18:37:00 27 Female 40 Term2
8 C 2013-02-22 17:46:07 40 Male 5 Term2
9 C 2013-02-27 11:21:00 40 Male 5 Term2
10 D 2012-10-28 14:48:33 20 Female 12 Term1


My issue is, I need to create further segmentation dependent on Contact.ID's

So the groups I am looking to create are:


  • Term 1 Only --> ID's that are only present in term 1 but not in any other term (ID D for example)

  • Term 2 Only ---> ID's that are only present in term 2 but not in any other term (ID C for example)

  • Term 1 & 2 --> ID's that are only in term 1 and 2 not term 3(ID B for example)

  • Term 1 & 2 & 3 --> ID's that are present in all terms (ID A for example)



I have tried different ways of adding conditions to the
subset
and also tried
df[ which ()]
sort of function and
subset(df, () & () & !())
but I cant seem to get it right.

Any suggestions? I sincerely appreciate the help.

Answer Source

So i dont know what the WB is in your data set, but I think you can follow this code to fill in what you're looking for. We basically just need to filter on the unique number of terms that each Contact.ID are in and then make sure the terms are correct. I am not counting "WB" as one of the terms since it doesn't look like you are.

library(data.table)

dat = read.table("clipboard", header = TRUE)
setDT(dat)

dat[ , 'Num_Unique_Terms' := uniqueN(Term[Term != "WB"]), by = Contact.ID]

term1 = dat[Num_Unique_Terms == 1 & Term == "Term1"]
term2 = dat[Num_Unique_Terms == 1 & Term == "Term2"]
terms1and2 = dat[!Term %in% c("Term3") & Num_Unique_Terms == 2]
terms12and3 = dat[Num_Unique_Terms == 3]

> term1
   Contact.ID       Date     Time Age Gender Attendance  Term Num_Unique_Terms
1:          D 2012-10-28 14:48:33  20 Female         12 Term1                1
> term2
   Contact.ID       Date     Time Age Gender Attendance  Term Num_Unique_Terms
1:          C 2013-02-22 17:46:07  40   Male          5 Term2                1
2:          C 2013-02-27 11:21:00  40   Male          5 Term2                1
> terms1and2
   Contact.ID       Date     Time Age Gender Attendance  Term Num_Unique_Terms
1:          B 2012-11-15 16:58:15  27 Female         40 Term1                2
2:          B 2012-12-23 10:57:02  27 Female         40    WB                2
3:          B 2013-01-11 17:31:22  27 Female         40 Term2                2
4:          B 2013-02-18 18:37:00  27 Female         40 Term2                2

> terms12and3
   Contact.ID       Date     Time Age Gender Attendance  Term Num_Unique_Terms
1:          A 2012-10-06 18:54:48  37   Male         30 Term1                3
2:          A 2013-03-12 20:50:18  37   Male         30 Term2                3
3:          A 2013-05-24 20:18:44  37   Male         30 Term3                3