Nicholas Nicholas - 26 days ago 8
MySQL Question

How to optimise a large table in MySQL, when can I benefit from partitioning?

Still creating the partitions to see if I can benefit from this feature, but I believe the biggest issue was server configuration.

The innodb_buffer_pool_size was far too low. Set to 8M as default, Rick James also recommends 70% of RAM so I increased it to 4G on a 12G shared server and the results were astounding. The slow query completles in 1.6 seconds and everything is much faster, even my web application is flying. I'll post results once I have partitioning in place too, however I do feel that this is probably the best solution to my lag problem.

EDIT 16-11-11 SOLUTION

I am in the process of refactoring a large table that logs telemetry data, it has been running for about 4-5 months and has generated approx. 54 million records with an average row size approx. 380 bytes.

I have started to see some performance lag on one of my raw data queries that returns all logs for a device over a 24 hour period.

Initially I thought it was indexing, but I think it is the amount of I/O that needs to be processed by MySQL. A typical 24 hour query would contain 2.2k 3k to 9k records and I’d actually like to support an export of about 7 days.

I am not experienced in database performance tuning so still just learning the ropes. I am considering a few strategies.


  1. Tweak compound indexes according to query for raw data, although I think my indexes are OK as the explain plan is showing 100% hit rate.

  2. Consider creating a covering index to include all rows needed

  3. Implement ranged partitioning by date:
    a) Keep monthly partitions. E.g. last 6 months
    b) Move anything older to archive table.

  4. Create a separate table (vertical partitioning) with the raw data and join it with the IDs of the primary query table. Not sure this is my problem as my indexes are working.

  5. Change my queries to pull data in batches with limits, then order by created date limit X and carry on until no more records are returned.

  6. Review server configuration



1,2 (INDEXES):
I’ll rework my indexes with my queries, but I think I am good here as Explain is showing 100% hit, unless I am reading this wrong.

I’ll try a covering index when they are rebuilt, but how do I determine the knock on effects of making a bad setting? E.G. insert speeds are compromised.

How would I best monitor the performance of my table in a live environment?

EDIT: I've just started using the slow log file which looks like a good tool for finding issues and I suppose a query on the performance_schema might be another option?

3 (PARTITIONING):
I have read a bit about partitions and not sure if the size of my data would make much of a difference.

Rick James suggests >1M records, I’m at 54M and would like to keep around 300M prior to archiving, is my table is complex enough to benefit?

I have to test this out myself as I do not have experience with any of this stuff and it all theoretical to me. I just don’t want to go down this path if it isn’t suitable for my needs.

4 (Vertical partitioning via ‘joined’ detail table): I don’t I think am having table scan issues and I need all rows, so I am not sure this technique would be of benefit.

5 (Use limits and fetch again): Would this free up the server if I used less of its time in a single request? Would I see better I/O throughput at the cost of more commands on the same connection?

6 (Review Config): The other piece would be to review the default non developer configuration that is used when you install MySQL, perhaps there are some settings that can be adjusted? :-)

Thanks for reading, keen to hear any and all suggestions.

The following FYI:

TABLE:

CREATE TABLE `message_log` (
`db_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`db_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`created` datetime DEFAULT NULL,
`device_id` int(10) unsigned NOT NULL,
`display_name` varchar(50) DEFAULT NULL,
`ignition` binary(1) DEFAULT NULL COMMENT 'This is actually IO8 from the falcom device',
`sensor_a` float DEFAULT NULL,
`sensor_b` float DEFAULT NULL,
`lat` double DEFAULT NULL COMMENT 'default GPRMC format ddmm.mmmm \n',
`lon` double DEFAULT NULL COMMENT 'default GPRMC longitude format dddmm.mmmm ',
`heading` float DEFAULT NULL,
`speed` float DEFAULT NULL,
`pos_validity` char(1) DEFAULT NULL,
`device_temp` float DEFAULT NULL,
`device_volts` float DEFAULT NULL,
`satellites` smallint(6) DEFAULT NULL, /* TINYINT will suffice */
`navdist` double DEFAULT NULL,
`navdist2` double DEFAULT NULL,
`IO0` binary(1) DEFAULT NULL COMMENT 'Duress',
`IO1` binary(1) DEFAULT NULL COMMENT 'Fridge On/Off',
`IO2` binary(1) DEFAULT NULL COMMENT 'Not mapped',
`msg_name` varchar(20) DEFAULT NULL, /* Will be removed */
`msg_type` varchar(16) DEFAULT NULL, /* Will be removed */
`msg_id` smallint(6) DEFAULT NULL,
`raw` text, /* Not needed in primary query, considering adding to single table mapped to this ID or a UUID correlation ID to save on @ROWID query */
PRIMARY KEY (`db_id`),
KEY `Name` (`display_name`),
KEY `Created` (`created`),
KEY `DeviceID_AND_Created` (`device_id`,`created`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


DeviceID_AND_Created is the main index. I need the PK clustered index because I am using the record ID in a summary table that keeps track of the last message for a given device. Created would be the partition column, so I guess that would also be added to the PK cluster?

QUERY:

SELECT
ml.db_id, ml.db_created, ml.created, ml.device_id, ml.display_name, bin(ml.ignition) as `ignition`,
bin(ml.IO0) as `duress`, bin(ml.IO1) as `fridge`,ml.sensor_a, ml.sensor_b, ml.lat, ml.lon, ml.heading,
ml.speed,ml.pos_validity, ml.satellites, ml.navdist2, ml.navdist,ml.device_temp, ml.device_volts,ml.msg_id
FROM message_log ml
WHERE ml.device_id = @IMEI
AND ml.created BETWEEN @STARTDATE AND DATE_ADD(@STARTDATE,INTERVAL 24 hour)
ORDER BY ml.db_id;


This returns all logs for a given 24 hour period which at the moment is approx. 2.2k rows, average row size 381 bytes and will be reduced once I remove one of the TEXT fields (raw)

EDIT - Include Explain

Moved the database to a dedicated disk and I think performance has improved on smaller datasets, but I am still seeing around 6-9 seconds on 6k rows. I just inspected the .slow file - which is very helpful and found the some slower examples and other things that I can improve in the application. :-)

The QUERY is largely the same, just more data to return. (EXISTS was removed from my TEST, but it would be good to know if this is better constructed as a JOIN?)

WHERE ml.IMEI = @IMEI
AND EXISTS
(SELECT 1 as '' FROM devices where imei_number = @IMEI AND customer_id = 403)
AND ml.created BETWEEN '2016-11-06 13:30:00 ' AND DATE_ADD('2016-11-06 13:30:00 ',INTERVAL 24 hour)
ORDER BY db_id;


Updated Explain for new query. Note that IMEI is equivalent to device_id.

enter image description here

Answer

Implement ranged partitioning by date: a) Keep monthly partitions. E.g. last 6 months b) Move anything older to archive table.

This is a very good idea. I gues all the writes will be in the newest partition and you will query recent data only. You always want a situation where your data and index fits in memory. So no disk i/o on reads.

Depending on your use case it might even be wise to have one partition per week. Then you only have to keep max two weeks of data in memory for reading the last 7 days.

You might also want to tune your buffer sizes (i.e. innodb_buffer_pool_size) if you are using innodb as a engine or myisam_key_cache when using myisam engine.

Also adding ram to the DB machine usually helps as the os can then have the data files in memory.

If you have heavy writes you can also tune other options (i.e. how often writes are persisted to disk with innodb_log_buffer_size). This is in order to let dirty pages be in memory for longer to avoid writing them back to disk too often.