Mike Mike - 23 days ago 9
R Question

Creating Groups with Dplyr's "group_by" then Using Stringr to Find Differences Between Groups

Using the example below, I want to group the dataframe by CaseWorker, then Client, then determine for each Client group whether the list of tasks in "Task" is the same as the list of tasks in "Task2".

I would be happy witha simple true or false, or better yet, if each task that is in "Task2" but not "Task" could be extracted and displayed in a new column or dataframe.

So basically I need to make sure "Task" and "Task2" contain the same entries for each individual Client.

I would like to stick with Dplyr and Stringr if possible, or at least stay within the Tidyverse. I'm thinking there's some way of using "group_by" and "str_detect" or some other Stringr functionality to achieve this in an elegant manner.

CaseWorker<-c("John","John","John","John","John","John","Melanie","Melanie","Melanie","Melanie","Melanie","Melanie")
Client<-c("Chris","Chris","Chris","Tom","Tom","Tom","Valerie","Valerie","Valerie","Tim","Tim","Tim")
Task<-c("Feed cat","Make dinner","Iron shirt","Make dinner","Do homework","Make lunch","Make dinner","Feed cat","Buy groceries","Do homework","Iron shirt","Make lunch")
Task2<-c("Feed cat","Make dinner","Iron shirt","Make dinner","Do homework","Feed cat","Make dinner","Feed cat","Iron shirt","Do homework","Iron shirt","Make lunch")
Df<-data.frame(CaseWorker,Client,Task,Task2)

Answer

See if this is what you're after.

First, see if Task matches Task2. If not, return Task2 as a new variable. I stored this into a new data frame df2

df2 <- Df %>% 
    mutate(match = Task == Task2,
           non_match = ifelse(!match, Task2, "")) 
df2

#    CaseWorker  Client          Task       Task2 match  non_match
# 1        John   Chris      Feed cat    Feed cat  TRUE           
# 2        John   Chris   Make dinner Make dinner  TRUE           
# 3        John   Chris    Iron shirt  Iron shirt  TRUE           
# 4        John     Tom   Make dinner Make dinner  TRUE           
# 5        John     Tom   Do homework Do homework  TRUE           
# 6        John     Tom    Make lunch    Feed cat FALSE   Feed cat
# 7     Melanie Valerie   Make dinner Make dinner  TRUE           
# 8     Melanie Valerie      Feed cat    Feed cat  TRUE           
# 9     Melanie Valerie Buy groceries  Iron shirt FALSE Iron shirt
# 10    Melanie     Tim   Do homework Do homework  TRUE           
# 11    Melanie     Tim    Iron shirt  Iron shirt  TRUE           
# 12    Melanie     Tim    Make lunch  Make lunch  TRUE           

Then summarise the results to see if individual CaseWorker/Client pairs match for all entries.

df2 %>% 
   group_by(CaseWorker, Client) %>% 
   summarise(n = n(),
             matches = sum(match),
             all_match = n == matches)

#   CaseWorker  Client     n matches all_match
#        <chr>   <chr> <int>   <int>     <lgl>
# 1       John   Chris     3       3      TRUE
# 2       John     Tom     3       2     FALSE
# 3    Melanie     Tim     3       3      TRUE
# 4    Melanie Valerie     3       2     FALSE

You could then of course merge this back into your data frame if you needed the all_match variable in your original dataset.