user2205916 user2205916 - 4 months ago 13
SQL Question

How do I calculate retention using SQL?

I am trying to calculate the 1-day, 7-day, and 2-week retention rates but I'm stuck. A table called

users
has the date of the first login. A table called
logins
has the dates of all subsequent logins.

My thinking was to find the "max," or most recent login from the
logins
table and find how many days this was since the user's first login. Eventually, I would want to group by first login date and count the number of users from this "cohort" who logged into the app 1 day later, 7 days later, and 14 days later. I feel like I'm close but it's not quite there.

Below is what I have so far.

DESCRIBE users;
Field Type Null Key Default Extra
---------------- ----------- ---- --- ------------------- ---------------------------
uid int(11) YES MUL (null)
device_id varchar(64) YES (null)
install_ts timestamp NO CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP
firstlogin_ts timestamp NO 0000-00-00 00:00:00
firstpurchase_ts timestamp NO 0000-00-00 00:00:00

DESCRIBE logins;
Field Type Null Key Default Extra
----------- ----------- ---- --- ----------------- ---------------------------
uid int(11) NO (null)
device_id varchar(64) NO (null)
login_ts timestamp NO CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP
login_count int(11) NO (null)
level int(11) NO (null)


SELECT logins.uid,
MAX(CAST(login_ts AS Date)) AS login_dt,
CAST(firstlogin_ts AS Date) AS firstlogin_dt,
MAX(DATEDIFF(CAST(login_ts AS Date),
CAST(firstlogin_ts AS Date))) AS retentionDays
FROM logins
LEFT JOIN users
ON logins.uid = users.uid
GROUP BY logins.uid
ORDER BY logins.uid, login_dt

uid login_dt firstlogin_dt retentionDays
------ ---------- ------------- -------------
121043 2015-01-04 (null) (null)
121044 2015-01-04 (null) (null)
121045 2015-01-06 2015-01-01 5
121046 2015-01-05 2015-01-01 4
121047 2015-01-04 2015-01-01 3
121049 2015-01-05 2015-01-01 4
121050 2015-01-02 2015-01-01 1
121054 2015-01-04 2015-01-01 3
121055 2015-01-15 2015-01-01 14
121056 2015-01-07 2015-01-01 6
121057 2015-01-12 2015-01-01 11
121058 2015-01-02 2015-01-01 1
121060 2015-01-11 2015-01-01 10
121063 2015-01-07 2015-01-01 6
121065 2015-01-05 2015-01-01 4
121066 2015-01-07 2015-01-01 6
121067 2015-01-03 2015-01-01 2
121069 2015-01-03 2015-01-01 2
121070 2015-01-06 2015-01-01 5


Any hints would be appreciated.

Answer

I'm going to assume the query in your question is correct, and you just need help to take it one step further.

It looks like what you want to do is select from the results of your first query, and count the number of users with retentionDays >= 1, the number of users with retentionDays >= 7 and the number of users with retentionDays >= 14.

Translated in SQL, that gives a query like:

SELECT
  SUM(IF(retentionDays >= 1, 1, 0)) as 1day
  SUM(IF(retentionDays >= 7, 1, 0)) as 7days
  SUM(IF(retentionDays >= 14, 1, 0)) as 14days
FROM (
  /* your previous query */
) as computedRetentionDays;

If you want a ratio instead of values, you can divide each sum by COUNT(uid).

That being said, I suggest you add a column lastlogin_ts on the users, that will make your life much easier (and this query much faster).

Comments