EDITED: added full query by request.
In essence I have a table of posts linked one to many to a table of reposts, akin to Twitter. I want to load the posts ordered by the time of the repost (if present) or the time of the original post. However, the ordering process is very slow using a single query (probably do the the fact that COALESCE(x, y) doesn't make full use of MySQL indexes). The time column on both relevant tables is indexed.
My query looks something like this.
SELECT * FROM Post p LEFT JOIN p.reposts ON ... WHERE ...
ORDER BY COALESCE(r.time, p.time) LIMIT 0, 10
SELECT * FROM Post p LEFT JOIN p.reposts repost ON (p.id = repost.post_id AND
repost.time = (
SELECT MIN(r.time) FROM Repost r WHERE p.id = r.post_id
AND r.user_id IN (1, 2, 3...) AND r.user_id NOT IN (4, 5, 6...))
WHERE (repost IS NOT NULL OR p.author_id IN (1, 2, 3...))
AND p.author_id NOT IN (4, 5, 6...)
ORDER BY COALESCE(repost.time, p.time) LIMIT 0, 10
...ORDER BY repost.time DESC
...WHERE repost.id IS NOT NULL...
... ORDER BY CASE WHEN repost.id IS NULL p.time ELSE repost.time END DESC
SELECT * FROM Repost r LEFT JOIN r.post ON ... WHERE ... ORDER BY r.time DESC
The problem here was as I described in update 2 of my question. MySQL uses indexes to perform ORDER BY operations quickly. More specifically, MySQL uses B-trees to index columns (such as timestamps - p.time/r.time), which use up a bit more space but allow for faster sorting.
The issue with my query was that it was sorting by the time column in two tables, using the timestamp from the repost table if available, and the post table otherwise. Since MySQL can't combine the B-trees from both tables, it can't perform fast index sorting on columns from two different tables.
I modified my query and table structure in two ways to solve this.
1) Perform filtering based on blocked users first, so ordering only has to be done on posts that are accessible by the current user. This was not the root of the problem, but is practical optimization. e.g.
SELECT * FROM (SELECT * FROM Post p WHERE p.author_id NOT IN (4, 5, 6...))...
2) Treat every post as a repost by its author, so every post is guaranteed to have a joinable repost and repost.time on which to index and sort. e.g.
SELECT * FROM (...) LEFT JOIN p.reposts repost ON (p.id = repost.post_id AND repost.time = ( SELECT MIN(r.time) FROM Repost r WHERE p.id = r.post_id AND r.user_id IN (1, 2, 3...) AND r.user_id NOT IN (4, 5, 6...)) )) WHERE (repost.id IS NOT NULL) ORDER BY repost.time DESC LIMIT 0, 10
At the end of the day the issue came down to ORDER BY - this approach reduced the query time from about 8 seconds to 20 ms.