watchtower - 7 months ago 36

R Question

I have two data.frames--one look-up table that tells me a set products included in a group. Each group has *at least* one product of Type 1 and Type 2.

The second data.frame tells me details about the transaction. Each transaction can have one of the following products:

a) Only product**s** of Type 1 from one of the groups

b) Only product**s** of Type 2 from one of the groups

c) Product of Type 1 and Type 2 *from the same group*

For my analysis, I am interested in finding out c) above i.e. how many transactions have products of Type 1 *and* Type 2 (*from the same group*) sold. We will ignore the transaction altogether if Product of Type 1 and that of Type 2 from different groups that are sold in the same transaction.

Thus, each product of Type 1 or Type 2 MUST belong to the same group.

Here's my look up table:

`> P_Lookup`

Group ProductID1 ProductID2

Group1 A 1

Group1 B 2

Group1 B 3

Group2 C 4

Group2 C 5

Group2 C 6

Group3 D 7

Group3 C 8

Group3 C 9

Group4 E 10

Group4 F 11

Group4 G 12

Group5 H 13

Group5 H 14

Group5 H 15

For instance, I won't have Product G and Product 15 in one transaction because they belong to different group.

Here are the transactions:

`TransactionID ProductID ProductType`

a1 A 1

a1 B 1

a1 1 2

a2 C 1

a2 4 2

a2 5 2

a3 D 1

a3 C 1

a3 7 2

a3 8 2

a4 H 1

a5 1 2

a5 2 2

a5 3 2

a5 3 2

a5 1 2

a6 H 1

a6 15 2

Now, I was able to write code using

`dplyr`

Here's my code:

`P_Groups<-unique(P_Lookup$Group)`

Chosen_Group<-P_Groups[5]

P_Group_Ind <- P_Trans %>%

group_by(TransactionID)%>%

dplyr::filter((ProductID %in% unique(P_Lookup[P_Lookup$Group==Chosen_Group,]$ProductID1)) |

(ProductID %in% unique(P_Lookup[P_Lookup$Group==Chosen_Group,]$ProductID2)) ) %>%

mutate(No_of_PIDs = n_distinct(ProductType)) %>%

mutate(Group_Name = Chosen_Group)

P_Group_Ind<-P_Group_Ind[P_Group_Ind$No_of_PIDs>1,]

This works well as long as I manually select each group i.e. by setting

`Chosen_Group`

I'd sincerely appreciate any help. I have spent almost two days on this. I looked at using dplyr in for loop in r, but it seems this thread is talking about a different issue.

Here's

`dput`

`P_Trans`

`structure(list(TransactionID = c("a1", "a1", "a1", "a2", "a2",`

"a2", "a3", "a3", "a3", "a3", "a4", "a5", "a5", "a5", "a5", "a5",

"a6", "a6"), ProductID = c("A", "B", "1", "C", "4", "5", "D",

"C", "7", "8", "H", "1", "2", "3", "3", "1", "H", "15"), ProductType = c(1,

1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2)), .Names = c("TransactionID",

"ProductID", "ProductType"), row.names = c(NA, 18L), class = "data.frame")

Here's

`dput`

`P_Lookup`

`structure(list(Group = c("Group1", "Group1", "Group1", "Group2",`

"Group2", "Group2", "Group3", "Group3", "Group3", "Group4", "Group4",

"Group4", "Group5", "Group5", "Group5"), ProductID1 = c("A",

"B", "B", "C", "C", "C", "D", "C", "C", "E", "F", "G", "H", "H",

"H"), ProductID2 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,

14, 15)), .Names = c("Group", "ProductID1", "ProductID2"), row.names = c(NA,

15L), class = "data.frame")

Here's the

`dput()`

`structure(list(TransactionID = c("a1", "a1", "a1", "a2", "a2",`

"a2", "a3", "a3", "a3", "a3", "a4", "a5", "a5", "a5", "a5", "a5",

"a6", "a6", "a7"), ProductID = c("A", "B", "1", "C", "4", "5",

"D", "C", "7", "8", "H", "1", "2", "3", "3", "1", "H", "15",

"22"), ProductType = c(1, 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2,

2, 2, 2, 1, 2, 3)), .Names = c("TransactionID", "ProductID",

"ProductType"), row.names = c(NA, 19L), class = "data.frame")

Answer

Below is a tidyverse (dplyr, tidyr, and purrr) solution that I hope will help.

Note that the use of `map_df`

in the last line returns all results as a data frame. If you'd prefer it to be a list object for each group, then simply use `map`

.

```
library(dplyr)
library(tidyr)
library(purrr)
# Save unique groups for later use
P_Groups <- unique(P_Lookup$Group)
# Convert lookup table to product IDs and Groups
P_Lookup <- P_Lookup %>%
gather(ProductIDn, ProductID, ProductID1, ProductID2) %>%
select(ProductID, Group) %>%
distinct() %>%
nest(-ProductID, .key = Group)
# Bind Group information to transactions
# and group for next analysis
P_Trans <- P_Trans %>%
left_join(P_Lookup) %>%
unnest(Group) %>%
group_by(TransactionID)
# Iterate through Groups to produce results
map(P_Groups, ~ filter(P_Trans, Group == .)) %>%
map(~ mutate(., No_of_PIDs = n_distinct(ProductType))) %>%
map_df(~ filter(., No_of_PIDs > 1))
#> Source: local data frame [12 x 5]
#> Groups: TransactionID [4]
#>
#> TransactionID ProductID ProductType Group No_of_PIDs
#> <chr> <chr> <dbl> <chr> <int>
#> 1 a1 A 1 Group1 2
#> 2 a1 B 1 Group1 2
#> 3 a1 1 2 Group1 2
#> 4 a2 C 1 Group2 2
#> 5 a2 4 2 Group2 2
#> 6 a2 5 2 Group2 2
#> 7 a3 D 1 Group3 2
#> 8 a3 C 1 Group3 2
#> 9 a3 7 2 Group3 2
#> 10 a3 8 2 Group3 2
#> 11 a6 H 1 Group5 2
#> 12 a6 15 2 Group5 2
```

Source (Stackoverflow)