MeshMan MeshMan - 1 year ago 100
MySQL Question

MySql Simple Query Performance - 1m rows

I'm currently the developer behind the site Twitch Links (NSFW). I'm the sole creator and I'm starting to feel concerned about a simple query that is the back bone of the web site (I'm by no means a DBA).

To quickly summarize, it's collecting URL's from Twitch.TV chat. It's grabbed around 700k in 3 weeks. The site visually displays the most recent Imgur's and Youtube videos. I have a single "links" table, with a "favourites" table where I stored user favourited links (Id, LinkId, UserId).

Here's the query:

(SELECT COUNT(*) FROM favourites WHERE LinkId = links.Id) AS NumFavourites, # Is this bad per row?
FROM links
Channel LIKE "%%" AND # Can sometimes be populated with a single value, e.g. "Channel like '%riotgames%'"
Type IN ('Imgur', 'YouTube') AND # Can sometimes be "Type LIKE '%Imgur%'" or "Type LIKE '%Facebook%'" - there are about 20 different types.
Deleted = 0 AND # Out of 500k rows, about 100 will be deleted.
Id > 0 AND # Will be set to a high ID after first view to only return latest rows.
Data IS NOT NULL # Exclude badly parsed links.

Here's the explain plan:

Explain Plan

Here are the keys:

KEY `idx_links_Channel` (`Channel`),
KEY `idx_links_Type` (`Type`),
KEY `idx_links_CapturedOn` (`CapturedOn`)

The "Channel LIKE" statement can change when requesting links for a particular users favourite channels only. This becomes:

Channel IN (SELECT CONCAT('#', ChannelName) FROM channelfavourites WHERE UserId = X) AND

There is an index on the "favourites" table on its Id, UserId, LinkId columns.

There is an index on "channelfavourites" for column "UserId".

Here are my questions about it:

  1. Why is it showing that 382k rows are being scanned? Shouldn't the order by ID DESC LIMIT 40 always limit it to much less, ie, find 40 rows matching the WHERE then stop?

  2. When the sites been running a year, and it has say 12m rows, will this query still scale? Will MySQL be clever enough to realize with the "ORDER BY Id DESC" to start at the end of the data set on disk and work backwards?

I've been thinking to flatten this query completely to a more view style table "
SELECT l.* FROM validlinks v JOIN links l ON l.Id = v.LinkId ORDER BY Id DESC LIMIT 40
". But will that simple join be worth it?

Any input would be appreciated.


VERSION(): 5.7.10-log

Indexes: Indexes

Create SQL:

CREATE TABLE `links` (
`Type` varchar(45) DEFAULT NULL,
`URL` text,
`CapturedOn` datetime DEFAULT NULL,
`CapturedBy` text,
`Channel` varchar(100) DEFAULT NULL,
`Data` text,
`Deleted` bit(1) DEFAULT b'0',
`DonationId` varchar(100) DEFAULT NULL,
KEY `idx_links_Channel` (`Channel`),
KEY `idx_links_Type` (`Type`),
KEY `idx_links_CapturedOn` (`CapturedOn`)

Answer Source

If you want speed you're going to have to jettison a bunch of junk in this query.

Don't flag things as deleted, delete them. If you need to archive those, dump them in a secondary table. Just get them out of the way.

Clear out invalid data as aggressively as you can. This eliminates tests like NOT NULL in your query. You can filter any stray records you get in your application layer.

Don't use LIKE, it results in huge table scans. Instead use a full-text index. Those are significantly faster.

Build an index that incorporates all of your conditions. Try and order it such that the first items exclude the most data relative to the later ones.

If you had to index this as-is:

CREATE INDEX idx_links_for_searching (Deleted, Type, id)

Add in full-text and you're doing even better.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download