coolerfarmer coolerfarmer - 4 months ago 9
SQL Question

Query data from table based of entries in another table

So the title may be a little confusing, but it's hard to describe the problem in just a few words. So I have two tables that are important right now:

CREATE TABLE IF NOT EXISTS `private_crawler_urls` (
`id` int(11) NOT NULL,
`url` text NOT NULL,
`hash` varchar(47) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB AUTO_INCREMENT=100 DEFAULT CHARSET=latin1;

CREATE TABLE IF NOT EXISTS `private_crawler_url_checks` (
`id` int(11) NOT NULL,
`url_id` int(11) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`total_proxies` int(11) NOT NULL,
`working_proxies` int(11) NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=78 DEFAULT CHARSET=latin1;


Now I want to select all the urls from
private_crawler_urls
where there's no entry in
private_crawler_url_checks
which is newer than a given Timespan (e.g. 10 Minutes). That's what I have right now:

SELECT
u.id, u.url, c.created_at
FROM
private_crawler_urls u
INNER JOIN
private_crawler_url_checks c ON (c.url_id = u.id)
WHERE
c.created_at < NOW() - INTERVAL 10 MINUTE
ORDER BY c.created_at ASC


The problem is that this can't work as I don't want to delete the old entries in
private_crawler_url_checks
and so there's always an entry that's older than the given Timespan even if there's a fresh one in the table.
As I'm pretty new to MySQL I have no Idea how I can accomplish this, so I need your help. Thanks! In case you need more information, just leave a comment!

Answer

I think this will make you keep your old records and not interfere with your wish. Not all that certain about performance impact though:

SELECT
    pcu.url
FROM
    private_crawler_urls as pcu
WHERE
    pcu.id NOT IN (

        SELECT
            pcuc.url_id
        FROM
            private_crawler_url_checks as pcuc
        WHERE
            pcuc.created_at > DATE_SUB(NOW(), INTERVAL 10 MINUTE)       
)

First it selects all url_id, where there is a created_at in the last 10 minutes. After that it will select all url where the id is not in this list