codeBarer codeBarer - 1 year ago 44
SQL Question

What is the efficient way of self joining date from an event stream?

I'm working on an event stream for the first time so self joining a table was something that I haven't done much at large scale. I'm trying to write SQL script to give me load time of chat lobby once the user has logged in. Below I have a sample script for an ETL. I want to know if there's a better way to do joins that what I have done. I have also thought of breaking the sub queries into two separate tables and doing a left join but not sure which way is considered optimal and better practice. I also thought of doing CTE.

DROP TABLE IF EXISTS event_stream;
create table event_stream
(
event varchar(50),
action varchar(100),
userid int,
session_id int,
date timestamp
);


INSERT INTO event_stream(event, action, userid, date,session_id) VALUES
('login', 'success', 1, '2017-08-15 12:30',100),
('chatlobby','loaded', 1, '2017-08-15 12:30:10',100),
('entered_chatroom','chatroom1',1,'8/15/2017 12:31',100),
('login', 'success', 54, '2017-08-15 01:30',101),
('chatlobby','loaded', 54, '2017-08-15 01:30:05',101),
('entered_chatroom','chatroom2',54,'8/15/2017 01:31',101);

Select a.userid, a.session_id, b.date - a.date as load_time
from
(Select * from event_stream where event = 'login' and action='success') a
JOIN
(Select * from event_stream where event = 'chatlobby' and action = 'loaded') b
ON a.session_id = b.session_id


The script above I created was done using PostgreSQL

Answer Source

The equivalent of your subqueries is:

from event_stream login join event_stream load on login.session_id = load.session.id
and login.action = 'success' 
and load.action = 'loaded'

This is simpler, and simplicity has its merits. To see which way performs faster, test them.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download