David Sargent David Sargent - 26 days ago 12
Apache Configuration Question

Why does sql query not output left join data in order by timestamp?

I have been working on a custom forum for a while now. I would like to display the recent posts from the forum. I am using PHP 7.0.8 and MySQL 5.7.16 on my Ubuntu server.

There are two tables that forum data is collected in.
uap4_forum_posts and uap4_forum_post_replies

Topics are saved to uap4_forum_posts.
Topic Replies are saved to uap4_forum_post_replies.

The goal it to get all new Topics and all Topic Replies if any and sort them by time stamps. I don't have any problems sorting Topics by date, but when I add Topic Replies it only uses the first Topic Reply. Therefore when there is more than one Topic Reply it does not show the newest one.

My database is setup as follows:
uap4_forum_posts
enter image description here
uap4_forum_post_replies
enter image description here

Here is the MySQL query I have so far:

SELECT sub.* FROM (
SELECT
fp.forum_post_id as forum_post_id, fp.forum_id as forum_id,
fp.forum_user_id as forum_user_id, fp.forum_title as forum_title,
fp.forum_edit_date as forum_edit_date,
fp.forum_timestamp as forum_timestamp, fpr.id as id,
fpr.fpr_post_id as fpr_post_id, fpr.fpr_id as fpr_id,
fpr.fpr_user_id as fpr_user_id, fpr.fpr_title as fpr_title,
fpr.fpr_edit_date as fpr_edit_date,
fpr.fpr_timestamp as fpr_timestamp,
GREATEST(fp.forum_timestamp, COALESCE(fpr.fpr_timestamp, '00-00-00 00:00:00')) AS tstamp
FROM uap4_forum_posts fp
LEFT JOIN uap4_forum_post_replies fpr
ON fp.forum_post_id = fpr.fpr_post_id
WHERE fp.allow = 'TRUE'
) sub
GROUP BY forum_post_id
ORDER BY tstamp DESC


Here are the results with the above query:
enter image description here

2016-11-08 13:36:46 Should be the most recent time stamp with the above data.

Thanks for any help provided. This is for a Free Open Source project I have been working on.

Answer

This question has been asked many times. It's a common "gotcha".

Note that the ORDER BY operation occurs after the GROUP BY operation.

Databases other than MySQL would throw an error with your query, along the lines of "non-aggregate expression in SELECT list not in GROUP BY clause".

A non-standard MySQL extension allows your query to run, but the values returned for some of the expressions in the SELECT list are indeterminate. The values will be from some row in the group, but there is no guarantee that it will be the latest, earliest, lowest, or be from any particular row in the group.

Reference: https://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html

The results returned by your query are consistent with the documented behavior.


This question has been asked many times before. I recommend an SO search

 [mysql] group by wrong results

FOLLOWUP

If the (fpr_post_id,fpr_timestamp) tuple is unique in usp4_forum_post_replies ...

we could get the "latest" fpr_timestamp for each fpr_post_id with a query like this:

  SELECT mr.fpr_post_id
       , MAX(mr.fpr_timestamp) AS max_fpr_timestamp
    FROM uap4_forum_post_replies mr
   GROUP BY mr.fpr_post_id

(For a non-trivial table, for best performance, we want to have a suitable index available, e.g.

... ON uap4_forum_post_replies (fpr_post_id,fpr_timestamp)

(Any index with those two columns as the leading columns, in that order.)


Then we can use that query as an inline view, as a row source, in another query. For example:

  SELECT fp.forum_post_id     AS forum_post_id
       , fp.forum_id          AS forum_id
       , fp.forum_user_id     AS forum_user_id
       , fp.forum_title       AS forum_title
       , fp.forum_edit_date   AS forum_edit_date
       , fp.forum_timestamp   AS forum_timestamp
       , fpr.id               AS id
       , fpr.fpr_post_id      AS fpr_post_id
       , fpr.fpr_id           AS fpr_id
       , fpr.fpr_user_id      AS fpr_user_id
       , fpr.fpr_title        AS fpr_title
       , fpr.fpr_edit_date    AS fpr_edit_date
       , fpr.fpr_timestamp    AS fpr_timestamp
       , GREATEST(fp.forum_timestamp,COALESCE(fpr.fpr_timestamp,'0000-00-00 00:00:00')) 
                                  AS tstamp
    FROM uap4_forum_posts fp
    LEFT
    JOIN ( -- latest fpr_timestamp for each post
           SELECT mr.fpr_post_id
                , MAX(mr.fpr_timestamp) AS max_fpr_timestamp
             FROM uap4_forum_post_replies mr
            GROUP BY mr.fpr_post_id
         ) lr
    LEFT
    JOIN uap4_forum_post_replies fpr
      ON fpr.fpr_post_id = lr.fpr_post_id
     AND fpr.fpr_timestamp = lr.max_fpr_timestamp
   WHERE fp.allow = 'TRUE'
   ORDER BY tstamp DESC

With this, we are using the latest fpr_timestamp to retrieve only the rows with the matching fpr_timestamp, for each fpr_post_id.

Note that if there are duplicate fpr_timestamp values for a given fpr_post_id, there's a potential to return multiple rows. If that's not guaranteed to be unique, we can extend the approach. Assuming that id is unique, we could get the maximum id of the rows with the same latest fpr_timestamp, with a query like this:

  SELECT xr.fpr_post_id
       , MAX(xr.id) AS max_id
    FROM ( SELECT mr.fpr_post_id
                , MAX(mr.fpr_timestamp) AS max_fpr_timestamp
             FROM uap4_forum_post_replies mr
            GROUP BY mr.fpr_post_id
         ) lr
    JOIN uap4_forum_post_replies xr
      ON xr.fpr_post_id = lr.fpr_post_id
     AND xr.fpr_timestamp = lr.max_fpr_timestamp

Then in the outer query, we could join to uap4_forum_post_replies on the id column, to retrieve the row with that id value.

This just one example. There are other query patterns that will achieve an equivalent result.