juliesls - 1 year ago 116

R Question

I'm having troubles with the

`arulesSequences`

I have a transactional dataset with temporal information (here, let's use the default

`zaki`

`cspade`

`library(arulesSequences)`

data(zaki)

frequent_sequences <- cspade(zaki, parameter=list(support=0.5))

Now, what I want is to find, for each sequence (i.e. for each custumer) which are the frequent subsequences that it supports. I tried various combinations of

`%in%`

`subset`

For example for the second custumer, the initial transactions

`inspect(zaki[zaki@itemsetInfo$sequenceID==2])`

`items sequenceID eventID SIZE`

5 {A,B,F} 2 15 3

6 {E} 2 20 1

The frequent sequences in the whole dataset

`inspect(frequent_sequences)`

`items support`

1 <{A}> 1.00

2 <{B}> 1.00

3 <{D}> 0.50

4 <{F}> 1.00

5 <{A, F}> 0.75

6 <{B, F}> 1.00

7 <{D}, {F}> 0.50

8 <{D}, {B, F}> 0.50

9 <{A, B, F}> 0.75

10 <{A, B}> 0.75

11 <{D}, {B}> 0.50

12 <{B}, {A}> 0.50

13 <{D}, {A}> 0.50

14 <{F}, {A}> 0.50

15 <{D}, {F}, {A}> 0.50

16 <{B, F}, {A}> 0.50

17 <{D}, {B, F}, {A}> 0.50

18 <{D}, {B}, {A}> 0.50

What I'd like to see is that customer 2 supports the frequent sequences 1, 2, 4, 5, 6, 9 and 10, but does not support the others.

I could also settle for the reverse information: which are the base sequences that support a given frequent subsequence? R somehow knows this information, since it uses it to compute the support of the frequent sequences.

It seems to me that this should be easy (and it probably is!) but I can't seem to figure it out...

Any idea ?

Answer Source

After some cool-headed digging, I found a way to do it, and indeed, it was easy... since the `support`

function does the job!

```
ids <- unique(zaki@itemsetInfo$sequenceID)
encoding <- data.frame()
# Prepare the data.frame: as many columns as there are frequent sequences
for (seq_id in 1:length(frequent_sequences)){
encoding[,labels(frequent_sequences[seq_id])] <- logical(0)
}
# Fill the rows
for (id in ids){
transaction_subset <- zaki[zaki@itemsetInfo$sequenceID==id]
encoding[id, ] <- as.logical(
support(frequent_sequences, transaction_subset, type="absolute")
)
}
```

There might be more aesthetic ways to reach the result, but this yields the expected result:

```
> encoding
<{A}> <{B}> <{D}> <{F}> <{A,F}> <{B,F}> <{D},{F}> <{D},{B,F}> <{A,B,F}>
1 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
2 TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE
3 TRUE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE
4 TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
<{A,B}> <{D},{B}> <{B},{A}> <{D},{A}> <{F},{A}> <{D},{F},{A}> <{B,F},{A}>
1 TRUE TRUE TRUE TRUE TRUE TRUE TRUE
2 TRUE FALSE FALSE FALSE FALSE FALSE FALSE
3 TRUE FALSE FALSE FALSE FALSE FALSE FALSE
4 FALSE TRUE TRUE TRUE TRUE TRUE TRUE
<{D},{B,F},{A}> <{D},{B},{A}>
1 TRUE TRUE
2 FALSE FALSE
3 FALSE FALSE
4 TRUE TRUE
```