chasenyc chasenyc - 3 months ago 9
MySQL Question

Querying Large Dataset on Join (15+ million rows)

I am trying to join two tables,

products
and
products_markets
. While
products
is under a million records,
product_markets
is closer to 20 million records. The data has been changed so there might be a typo or two in the schema create tables:

CREATE TABLE `products_markets` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`product_id` int(10) unsigned NOT NULL,
`country_code_id` int(10) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_index` (`product_id`,`country_code_id`)
) ENGINE=InnoDB AUTO_INCREMENT=21052102 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

CREATE TABLE `products` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`manufacturer_id` int(10) unsigned NOT NULL,
`department_id` int(10) unsigned NOT NULL,
`code` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL,
`popularity` int(11) DEFAULT NULL,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`value` bigint(20) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `products_code_unique` (`code`),
KEY `products_department_id_foreign` (`department_id`),
KEY `products_manufacturer_id_foreign` (`manufacturer_id`),
CONSTRAINT `products_department_id_foreign`
FOREIGN KEY (`department_id`) REFERENCES `departments` (`id`),
CONSTRAINT `products_manufacturer_id_foreign`
FOREIGN KEY (`manufacturer_id`) REFERENCES `manufacturers` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=731563 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;


I am trying to return 50 records of the most popular products available in a specific country and I am running into times around ~50 seconds which seems higher than expected.

I've tried a few different queries with no success:

select `products_markets`.`product_id`
from products_markets
left join
( SELECT products.id, products.popularity
from products
) p ON p.id = products_markets.product_id
and (products_markets.country_code_id = 121)
order by `popularity` desc, `p`.`id` asc
limit 50


and

select `products`.*
from `products`
where products.id in (
SELECT product_id
from products_markets
where products_markets.country_code_id = 121
)
group by `products`.`name`, `products`.`manufacturer_id`
order by `popularity` desc, `products`.`id` asc
limit 50


this query's explain is:

id select_type table type possible_keys key key_len refs rows extra
1 PRIMARY products ALL PRIMARY NULL NULL NULL 623848 Using temporary; Using filesort
1 PRIMARY products_markets ref unique_index unique_index 4 main.products.id 14 Using where; Using index; FirstMatch(products)


One option I am entertaining is splitting up products_markets into individual tables for each country to lessen the query. I've tried adding more memory to the server without much success. Can anyone identify anything glaringly wrong with the database design/query?

What other options are available to make this query a fraction of its current ~50 seconds?

Answer

Get rid of id in products_markets and add

PRIMARY KEY(country_code_id, product_id)

Then get rid of the UNIQUE key unless it is needed for some other query.

This will shrink the disk footprint of that large table significantly, thereby potentially speeding up all queries touching it.

And it will help with Hamaza's suggested reformulation.

Comments