pineoclean pineoclean - 6 months ago 65
MySQL Question

mysql most popular page in 5 minute intervals

I have a MySQL table "page" that contains fields "page_name" and "page_timestamp". The table stores page and times requests that were made on a website. I'm trying to write a query that gives me the most hit page for every 5 min period.

Output like

Time Page Hits
12:00 index.html 34
12:05 page1.html 11
12:10 index.html 44


This is one attempt, but no output.

select pages_timestamp,
(select count( pages_name)
from pages t2
where UNIX_TIMESTAMP(t2.pages_timestamp) DIV 300 =
UNIX_TIMESTAMP(pages_timestamp)
group by pages_name
order by count(pages_name) desc
limit 1 )
from pages
where ...
group by UNIX_TIMESTAMP(pages_timestamp ) DIV 300


I'm sure there is a better approach

Answer

This assumes page_timestamp is a DATETIME type.

Here is a sample with some data I created, it helps to visualize the output.

So first off, lets create a function for time rounding, this will declutter our query:

drop function if exists rtime;
create function rtime (time DATETIME)
returns DATETIME

BEGIN
DECLARE newtime DATETIME;
set newtime = from_unixtime(floor(unix_timestamp(time)/300)*300);
return newtime;
END;

Next let's explore our data a little bit. I want to pull all Time, Page, and Count(Page_Name), grouping by interval and page.

SELECT
Time, Page, Hits 
from (
    select rtime(pages_timestamp) as time
    , page_name as page
    ,  count(page_name) as hits 
    from pages 
    group by page_name, rtime(pages_timestamp) 
    order by rtime(pages_timestamp), hits desc) g ;

This orders our aggregated table by time interval then the number of hits. Since our groups are ordered by most hits desc we can pull the first row. In MySQL we can non-aggregate non-group by columns (src. This gives us the first row per group, the one with the most hits. We just SELECT * from the above table grouping only by time:

select * from 
(select time, page, hits 
from (
     select rtime(pages_timestamp) as time
     , page_name as page
     ,  count(page_name) as hits 
     from pages 
     group by page_name, rtime(pages_timestamp) 
     order by rtime(pages_timestamp), hits desc
     ) g 
) h group by time;

Disclaimer!: If there is a tie, if two pages both have the most hits, this will only pull one record.