Oscar Jofre Oscar Jofre - 5 months ago 11
SQL Question

Optimize Query on mysql

I have a query that runs really slow (15 20 seconds) when is not on memory and quite fast when is on memory (2s - 0.6s)

select count(distinct(concat(conexiones.tMacAdres,date_format(conexiones.fFecha,'%Y%m%d')))) as Conexiones,
sum(if(conexiones.tEvento='megusta',1,0)) as MeGusta,sum(if(conexiones.tEvento='megusta',conexiones.nAmigos,0)) as ImpactosMeGusta,
sum(if(conexiones.tEvento='checkin',1,0)) as CheckIn,sum(if(conexiones.tEvento='checkin',conexiones.nAmigos,0)) as ImpactosCheckIn,
min(conexiones.fFecha) Fecha_Inicio, now() Fecha_fin,datediff(now(),min(conexiones.fFecha)) as dias
from conexiones, instalaciones
where conexiones.idInstalacion=instalaciones.idInstalacion and conexiones.idInstalacion=190
and (fFecha between '2014-01-01 00:00:00' and '2016-06-18 23:59:59')
group by instalaciones.tNombre
order by instalaciones.idCliente


This is Table SCHEMAS:
Instalaciones with 1332 rows:

CREATE TABLE `instalaciones` (
`idInstalacion` int(10) unsigned NOT NULL AUTO_INCREMENT,
`idCliente` int(10) unsigned DEFAULT NULL,
`tRouterSerial` varchar(50) DEFAULT NULL,
`tFacebookPage` varchar(256) DEFAULT NULL,
`tidFacebook` varchar(64) DEFAULT NULL,
`tNombre` varchar(128) DEFAULT NULL,
`tMensaje` varchar(128) DEFAULT NULL,
`tWebPage` varchar(128) DEFAULT NULL,
`tDireccion` varchar(128) DEFAULT NULL,
`tPoblacion` varchar(128) DEFAULT NULL,
`tProvincia` varchar(64) DEFAULT NULL,
`tCodigoPosta` varchar(8) DEFAULT NULL,
`tLatitud` decimal(15,12) DEFAULT NULL,
`tLongitud` decimal(15,12) DEFAULT NULL,
`tSSID1` varchar(40) DEFAULT NULL,
`tSSID2` varchar(40) DEFAULT NULL,
`tSSID2_Pass` varchar(40) DEFAULT NULL,
`fSincro` datetime DEFAULT NULL,
`tEstado` varchar(10) DEFAULT NULL,
`tHotspot` varchar(10) DEFAULT NULL,
`fAlta` datetime DEFAULT NULL,
PRIMARY KEY (`idInstalacion`),
UNIQUE KEY `tRouterSerial` (`tRouterSerial`),
KEY `idInstalacion` (`idInstalacion`)
) ENGINE=InnoDB AUTO_INCREMENT=1332 DEFAULT CHARSET=utf8;


Conexiones with 2370365 rows

CREATE TABLE `conexiones` (
`idConexion` int(10) unsigned NOT NULL AUTO_INCREMENT,
`idInstalacion` int(10) unsigned DEFAULT NULL,
`idUsuario` int(11) DEFAULT NULL,
`tMacAdres` varchar(64) DEFAULT NULL,
`tUsuario` varchar(128) DEFAULT NULL,
`tNombre` varchar(64) DEFAULT NULL,
`tApellido` varchar(64) DEFAULT NULL,
`tEmail` varchar(64) DEFAULT NULL,
`tSexo` varchar(20) DEFAULT NULL,
`fNacimiento` date DEFAULT NULL,
`nAmigos` int(11) DEFAULT NULL,
`tPoblacion` varchar(64) DEFAULT NULL,
`fFecha` datetime DEFAULT NULL,
`tEvento` varchar(20) DEFAULT NULL,
PRIMARY KEY (`idConexion`),
KEY `idInstalacion` (`idInstalacion`),
KEY `tMacAdress` (`tMacAdres`) USING BTREE,
KEY `fFecha` (`fFecha`),
KEY `idUsuario` (`idUsuario`),
KEY `insta_fecha` (`idInstalacion`,`fFecha`)
) ENGINE=InnoDB AUTO_INCREMENT=2370365 DEFAULT CHARSET=utf8;


This is EXPLAIN

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE instalaciones const PRIMARY,idInstalacion PRIMARY 4 const 1
1 SIMPLE conexiones ref idInstalacion,fFecha,insta_fecha idInstalacion 5 const 110234 "Using where"


Thanks !

(Edited)

SHOW TABLE STATUS LIKE 'conexiones'

Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment Create_time Update_time Check_time Collation Checksum Create_options Comment
conexiones InnoDB 10 Compact 2305296 151 350060544 0 331661312 75497472 2433305 28/06/2016 22:26 NULL NULL utf8_general_ci NULL

Answer

Here's why it is so slow. And I will end with a possible speedup.

First, please do

SELECT COUNT(*) FROM conexiones
    WHERE idInstalacion=190
      and fFecha >= '2014-01-01'
      and fFecha  < '2016-06-19

in order to see how many rows we are dealing with. The EXPLAIN suggests 110234, but that is only a crude estimate.

Assuming there are 110K rows of conexiones involved in the query, and assuming the rows were (approximately) inserted in chronological order by fFecha, then...

  • There are a lot of rows to work with, and
  • They are scattered around the table on disk, hence
  • The query takes a lot of I/O, unless it is cached.

Let's further check on my last claim... How much RAM do you have? What is the value of innodb_buffer_pool_size? It should be about 70% of available RAM.

Assuming that conexiones is too big to be 'cached' in the 'buffer_pool', we need to find a way to decrease the I/O.

There are 1332 different values for idInstalacion. Perhaps you insert 1332 rows every few minutes/hours into conexiones? Since the PRIMARY KEY merely an AUTO_INCREMENT, those rows will be 'appended' to the end of the table.

Now let's look at where the idInstalacion=190 rows are. A new one of them occurs every 1332 (or so) rows. That means they are spread out. It means that (probably) no two rows are in the same block (16KB in InnoDB). That means that the 110234 will be in 110234 different blocks. That's about 2GB. If the buffer_pool is smaller than that, then there will be I/O. Even if it is bigger than that, that's a lot of data to touch.

But what to do about it? If we could arrange the =190 rows to be consecutive in the table, then the 2GB might drop to, say, 20MB -- a much more manageable and cacheable size. But how can that be done? By changing the PRIMARY KEY.

PRIMARY KEY(idInstalacion, fFecha, idConexion),
INDEX(idConexion)

and DROP any other indexes starting with idInstalacion or idConexion. To explain:

  • Since the PK is "clustered" with the data, all idInstalacion=190 rows over any consecutive fFetcha range will be consecutive in the data. So, fetching one block will get about 100 rows -- much less I/O.
  • A PK must be unique. Assuming (idInstalacion, fFecha) is not unique, I tacked on idConexion to make it unique.
  • I added INDEX(idConexion) to make AUTO_INCREMENT happy.

Potential drawback... Since this change rearranges the order of the data, other queries, including the INSERTs may be slowed down. The INSERTs will be scattered, but not really slowed down. 1332 "hots spots" would be accepting the new rows; that many blocks can easily be cached.

Arithmetic... If you have spinning drives (not SSDs), I would expect the existing structure to take about 1102 seconds for 110234 rows. Since it is taking under 20 seconds, I suspect there is some caching (or you have SSDs) or the 110234 is grossly overestimated. My suggested change should decrease the "worst" time significantly, and slightly improve the "in memory" time. This "slight improvement" comes from being able to use the PK instead of a secondary key.

Caveat: Since 110234 * 1332 is nowhere near 2370365, much of my numerical analysis is probably nowhere near correct. For example, 2370365 rows with that schema is possible less than 1GB. Please provide SHOW TABLE STATUS LIKE 'conexiones'.