Alban Couturier Alban Couturier - 3 months ago 9
R Question

Re-organize database by merging rows from a variable

I have a database that looks like this:

userId SessionId Screen Platform Version
01 1 first IOS 1.0.1
01 1 main IOS 1.0.1
01 2 first IOS 1.0.1
01 3 first IOS 1.0.1
01 3 main IOS 1.0.1
01 3 detail IOS 1.0.1
02 1 first Android 1.0.2


Basically what I intend to do is to determine if a "path" (different screens) leads to a better retention or not. I would like to re-organize each sessionId in one column. Ideal database would look like this:

userId SessionId Path Retention
01 1 first;main 3
01 2 first 3
01 3 first;main;detail 3
02 1 first 1


Here is the variable
Retention
would be equal to the maximum
SessionId
.

Answer

A possible solution in base R:

d2 <- aggregate(Screen ~ userId + SessionId, d, toString)
transform(d2, retention = ave(Screen, userId, FUN = length))

which gives:

> d2
  userId SessionId              Screen retention
1     01         1         first, main         3
2     02         1               first         1
3     01         2               first         3
4     01         3 first, main, detail         3

An alternative using dplyr:

library(dplyr)
d %>% 
  group_by(userId, SessionId) %>% 
  summarise(Screen = toString(Screen)) %>% 
  group_by(userId) %>% 
  mutate(retention = n())

which gives:

  userId SessionId              Screen retention
   <chr>     <int>               <chr>     <int>
1     01         1         first, main         3
2     01         2               first         3
3     01         3 first, main, detail         3
4     02         1               first         1