pradeep andrewson2 pradeep andrewson2 - 6 months ago 7
SQL Question

How to fetch data without a duplicate entry by using limit and order by

Here we are using limit for pagination

This was my scenario

I have a post with likes in my database .
Am fetching data by ordering maximum number of likes post with the limit 0,5


The problem is

we have shown the data by maximum likes with limit 0,5 for first page say as example


1) post:hello,Likes:8
2) post:hello1,Likes:8
3) post:hello2,Likes:7
4) post:hello3,Likes:6
5) post:hello4,Likes:5


Now for second page an calling 5,5

6) post:hello5,Likes:5
7) post:hello6,Likes:5
8) post:hello7,Likes:4
9) post:hello8,Likes:3
10) post:hello9,Likes:2


But if in case the post post:hello5 got Likes:8 at the time of going second page means am getting the duplicate,


Because the first page has only two likes 8 but when going secong page it has three likes 8 so when limiting the data of
the post of already fetch in page1 and page2 will be same
the out put like this


When running first page

1) post:hello,Likes:8
2) post:hello1,Likes:8
3) post:hello2,Likes:7
4) post:hello3,Likes:6
5) post:hello4,Likes:5


Second page

6) post:hello4,Likes:5
7) post:hello5,Likes:5
8) post:hello6,Likes:5
9) post:hello7,Likes:4
10) post:hello8,Likes:3


See here the page 1 and 2 has same data (post:hello4,Likes:5) because when going to call second page like8 has three data to when getting mysql query with limit 5 it makes the duplicate of already fetch data


this was the post will got to third page (post:hello9,Likes:2)

Answer

You should realize one thing:

this is probably what you want.

Consider loading page1 at time X. How long will you then spend reviewing results from page 1 before going to page 2? Is it few microseconds? then it is very unlikely order will change in that time. Is it few minutes? Results would then be outdated, when viewing the next page. And how do you decide it is time to update search and show new updated results?

If you still insist on solving this, I suggest making new table

CREATE TABLE likes_search_orders(search_id varchar(255) not null, post_id bigint(20) not null, order_id biging(20) not null)

set unique index to search_id-post_id and proceed with following algorithm:

  1. create random search_id (like md5(rand(0,1000000)."-".time()))

  2. run:

    INSERT INTO likes_search_orders (search_id, post_id, order_id) SELECT "search_id", post_id, row_number() over (order by (select NULL)) FROM table_with_likes ORDER BY likes DESC

  3. use:

    SELECT l.* FROM table_with_likes l INNER JOIN likes_search_orders o ON o.post_id = l.id AND o.search_id = "search_id" ORDER BY o.order_id DESC LIMIT 5,5

Every time you create new search ID, you "freeze" the DB in the state it was at that given instant (at least the order of the things, you could possibly include likes as well into likes_search_orders, and then you would have even exact likes count at that given time)

I would also suggest creating table with searches, where you store search_id and date when it was created. You can then setup a cron to clean up old searches from DB so that they dont bloat up DB

Comments