Rams Rams - 7 months ago 22
SQL Question

MySql composite index

We are using MySql as our DB

The following query is runs on mysql table(approx 25million records). I pasted two queries here.The queries runs too slowly and I was wondering if better composite indexes might improve the situation.

Any idea on what the best composite index would be?

and Suggest me Is composite index required for these queries

FIRST QUERY

EXPLAIN SELECT log_type,
count(DISTINCT subscriber_id) AS distinct_count,
count(*) as total_count
FROM stats.campaign_logs
WHERE domain = 'xxx'
AND campaign_id='12345'
AND log_type IN ('EMAIL_SENT', 'EMAIL_CLICKED', 'EMAIL_OPENED', 'UNSUBSCRIBED')
AND log_time BETWEEN CONVERT_TZ('2015-02-12 00:00:00','+05:30','+00:00')
AND CONVERT_TZ('2015-02-19 23:59:58','+05:30','+00:00')
GROUP BY log_type


EXPLAIN of above query

+----+-------------+---------------+-------------+--------------------------------------------------------------+--------------------------------+---------+------+-------+------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------------+--------------------------------------------------------------+--------------------------------+---------+------+-------+------------------------------------------------------------------------------+
| 1 | SIMPLE | campaign_logs | index_merge | campaign_id_index,domain_index,log_type_index,log_time_index | campaign_id_index,domain_index | 153,153 | NULL | 35683 | Using intersect(campaign_id_index,domain_index); Using where; Using filesort |
+----+-------------+---------------+-------------+--------------------------------------------------------------+--------------------------------+---------+------+-------+------------------------------------------------------------------------------+


SECOND QUERY

SELECT campaign_id
, subscriber_id
, campaign_name
, log_time
, log_type
, message
, UNIX_TIMESTAMP(log_time) AS time
FROM campaign_logs
WHERE domain = 'xxx'
AND log_type = 'EMAIL_OPENED'
ORDER
BY log_time DESC
LIMIT 20;


EXPLAIN of above query

+----+-------------+---------------+-------------+-----------------------------+-----------------------------+---------+------+--------+---------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------------+-----------------------------+-----------------------------+---------+------+--------+---------------------------------------------------------------------------+
| 1 | SIMPLE | campaign_logs | index_merge | domain_index,log_type_index | domain_index,log_type_index | 153,153 | NULL | 118392 | Using intersect(domain_index,log_type_index); Using where; Using filesort |
+----+-------------+---------------+-------------+-----------------------------+-----------------------------+---------+------+--------+---------------------------------------------------------------------------+


THIRD QUERY

EXPLAIN SELECT *, UNIX_TIMESTAMP(log_time) AS time FROM stats.campaign_logs WHERE domain = 'xxx' AND log_type <> 'EMAIL_SLEEP' AND subscriber_id = '123' ORDER BY log_time DESC LIMIT 100


EXPLAIN of above query

+----+-------------+---------------+------+-------------------------------------------------+---------------------+---------+-------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+-------------------------------------------------+---------------------+---------+-------+------+-----------------------------+
| 1 | SIMPLE | campaign_logs | ref | subscriber_id_index,domain_index,log_type_index | subscriber_id_index | 153 | const | 35 | Using where; Using filesort |
+----+-------------+---------------+------+-------------------------------------------------+---------------------+---------+-------+------+-----------------------------+


If you want any other details I can provide here

UPDATE (2016/April/22) :
Now we want to add one more column into existing table that is node id. One campaign can have multiple nodes. Whatever reports we are generating on campaigns we need those reports on individual nodes also now.

for example

SELECT log_type,
count(DISTINCT subscriber_id) AS distinct_count,
count(*) as total_count
FROM stats.campaign_logs
WHERE domain = 'xxx',
AND campaign_id='12345',
AND node_id = '34567',
AND log_type IN ('EMAIL_SENT', 'EMAIL_CLICKED', 'EMAIL_OPENED', 'UNSUBSCRIBED')
AND log_time BETWEEN CONVERT_TZ('2015-02-12 00:00:00','+05:30','+00:00')
AND CONVERT_TZ('2015-02-19 23:59:58','+05:30','+00:00')
GROUP BY log_type

CREATE TABLE `camp_logs` (
`domain` varchar(50) DEFAULT NULL,
`campaign_id` varchar(50) DEFAULT NULL,
`subscriber_id` varchar(50) DEFAULT NULL,
`message` varchar(21000) DEFAULT NULL,
`log_time` datetime DEFAULT NULL,
`log_type` varchar(50) DEFAULT NULL,
`level` varchar(50) DEFAULT NULL,
`campaign_name` varchar(500) DEFAULT NULL,
KEY `subscriber_id_index` (`subscriber_id`),
KEY `log_type_index` (`log_type`),
KEY `log_time_index` (`log_time`),
KEY `campid_domain_logtype_logtime_subid_index` (`campaign_id`,`domain`,`log_type`,`log_time`,`subscriber_id`),
KEY `domain_logtype_logtime_index` (`domain`,`log_type`,`log_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |


SIZE issue.

As we have two composite indexes , index file incresing rapidly. following are the table current stats.
Data size : 30 GB
Index size: 35 GB

for reports on node_id we want to update our existing composite index

from

KEY `campid_domain_logtype_logtime_subid_index` (`campaign_id`,`domain`,`log_type`,`log_time`,`subscriber_id`),


to

KEY `campid_domain_logtype_logtime_subid_nodeid_index` (`campaign_id`,`domain`,`log_type`,`log_time`,`subscriber_id`,`node_id`)


Could you suggest suitable composite indexes for both campaign and node level reports.

Thanks

Answer

This is your first query:

SELECT A.log_type, count(*) as distinct_count, sum(A.total_count) as total_count
from (SELECT log_type, count(subscriber_id) as total_count
      FROM stats.campaign_logs
      WHERE domain = 'xxx' AND campaign_id = '12345' AND
            log_type IN ('EMAIL_SENT', 'EMAIL_CLICKED', 'EMAIL_OPENED', 'UNSUBSCRIBED') AND
             DATE(CONVERT_TZ(log_time,'+00:00','+05:30')) BETWEEN DATE('2015-02-12 00:00:00') AND DATE('2015-02-19 23:59:58')
      GROUP BY subscriber_id,log_type) A
GROUP BY A.log_type;

It is better written as:

      SELECT log_type, count(DISTINCT subscriber_id) as total_count
      FROM stats.campaign_logs
      WHERE domain = 'xxx' AND campaign_id = '12345' AND
            log_type IN ('EMAIL_SENT', 'EMAIL_CLICKED', 'EMAIL_OPENED', 'UNSUBSCRIBED') AND
             DATE(CONVERT_TZ(log_time, '+00:00', '+05:30')) BETWEEN DATE('2015-02-12 00:00:00') AND DATE('2015-02-19 23:59:58')
      GROUP BY log_type;

The best index on this is probably: campaign_logs(domain, campaign_id, log_type, log_time, subscriber_id). This is a covering index for the query. The first three keys should be used for the where filtering.