Andrew Newby Andrew Newby - 6 months ago 14
SQL Question

Way to check if the same URL exists for another user in mySQL

First of all, let me explain the logic here. We have a table, which consists of articles added by users. Each user can add the same article more than once. We then have a "wall" (similar to Facebook), where it shows both their own articles + any of their "friends". Here is the table structure:

CREATE TABLE IF NOT EXISTS `ReadingListArticles` (
`article_id` int(11) NOT NULL AUTO_INCREMENT,
`user_id_fk` varchar(255) CHARACTER SET latin1 NOT NULL,
`url` varchar(255) CHARACTER SET latin1 NOT NULL,
`base_url` text CHARACTER SET latin1 NOT NULL,
`message` text COLLATE utf8_bin NOT NULL,
`timestamp` int(11) NOT NULL,
`image_path` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`image_id` varchar(15) CHARACTER SET latin1 DEFAULT NULL,
`title` varchar(200) COLLATE utf8_bin DEFAULT NULL,
`media_source` varchar(200) COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`article_id`),
KEY `user_id_fk` (`user_id_fk`,`url`),
KEY `image_id` (`image_id`),
KEY `user_id_fk_2` (`user_id_fk`,`media_source`),
KEY `media_source` (`media_source`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=0;


The SQL is pretty simple:

SELECT * FROM ReadingListArticles WHERE
user_id_fk IN ('their-username','friend1',,'friend2','etc')
AND base_url NOT IN ('list','of','sources','to','ignore')
ORDER BY article_id DESC LIMIT 20


This works just fine - but with one issue ... if they AND their friend have both shared the same article, it will show up 2 times on their list.

I'm wondering (pretty sure this must be possible, but I just don't know how :)) .... can we filter them out so:


  • If the article exists by THEM (given priority), then show it

  • If it DOESN'T exist by them, we can show it from any of their friends



Hopefully I've explained myself well, but please let me know if you need any more clarification.

UPDATE: I'm getting a little bit closer :)

SELECT * FROM ReadingListArticles WHERE user_id_fk IN ('test123','test2','designer1')
AND article_id > 0
GROUP BY url
ORDER BY article_id DESC


This works, but its getting the wrong record (it gets the latest one, which would only be correct if the current user were the last person to post that exact URL... but in reality this won't happen). In my sample data, I have 2 records with the same "url":

url => http://wiadomosci.gazeta.pl


Then the user_id_fk varies:


  • test123

  • graphics1



The user who is requesting the data, is "test123"... so what we need to somehow do is prioritize that user_id_fk value when doing the GROUP BY, instead of just selecting the latest one to be added with that URL.

Anyone got any fresh ideas?

Thanks!

I'm now using (as suggested) the following query:

SELECT ReadingListArticles.* from
(
select max(article_id) as article_id
from ReadingListArticles
where user_id_fk = 'Beata1234'
and article_id > 170
group by url
union
select max(article_id) as article_id
from ReadingListArticles
where user_id_fk IN ('Beata1234')
and article_id > 170
and not exists (SELECT 1
from ReadingListArticles rla1
where rla1.user_id_fk = 'Beata1234'
and rla1.article_id > 170
and rla1.url = ReadingListArticles.url)
group by url
) q
join ReadingListArticles
on ReadingListArticles.article_id = q.article_id
order by ReadingListArticles.article_id desc
limit 20;


However, it has issues with the filtering. I have
article_id > 170
, but its still matching those with
article_id <= 170
. How can we fix this?

Answer

Since you want to treat the main user and the other users differently, you can just split them up. With article_id being the only unique key in the table, you can use

SELECT ReadingListArticles.* from 
(
 select max(article_id) as article_id
 from ReadingListArticles 
 where user_id_fk = 'test123'
 and article_id > 0
 group by url
union
 select max(article_id) as article_id
 from ReadingListArticles 
 where user_id_fk IN ('test2','designer1')
 and article_id > 0
 and not exists (SELECT 1
   from ReadingListArticles rla1
   where rla1.user_id_fk = 'test123'
   and rla1.article_id > 0
   and rla1.url = ReadingListArticles.url)
 group by url
) q
join ReadingListArticles
on ReadingListArticles.article_id = q.article_id
order by ReadingListArticles.article_id desc
limit 20;

You might want to add an index on url, user_id_fk, but maybe user_id_fk, url is fine, depending on your data.

This will give you:

  • The full row for all articles that user test123 posted. If test123 used an url several times, only the last article for this url will be chosen.
  • The full row for all articles that test2 and designer1 posted, but which url test123 hasn't have posted himself. If test2 and designer1 posted the same url, or if they posted the same url several times, only the last article of these is chosen.
Comments