Amar Myana Amar Myana -4 years ago 162
MySQL Question

Mysql where between query optimization

Below is the format of the database of Autonomous System Numbers ( download and parsed from this site! ).

range_start range_end number cc provider
----------- --------- ------ -- -------------------------------------
16778240 16778495 56203 AU AS56203 - BIGRED-NET-AU Big Red Group
16793600 16809983 18144 AS18144


745465 total rows

A Normal query looks like this:

select * from table where 3232235520 BETWEEN range_start AND range_end


Works properly but I query a huge number of IPs to check for their AS information which ends up taking too many calls and time.

Profiler Snapshot:

Blackfire profiler snapshot

I've two indexes:


  1. id column

  2. a combine index on the range_start and range_end column as both the make unique row.



Questions:


  1. Is there a way to query a huge number of IPs in a single query?


    • multiple
      where (IP between range_start and range_end) OR where (IP between range_start and range_end) OR ...
      works but I can't get the IP -> row mapping or which rows are retrieved for which IP.


  2. Any suggestions to change the database structure to optimize the query speed and decrease the time?



Any help will be appreciated! Thanks!

Answer Source

It is possible to query more than one IP address. Several approaches we could take. Assuming range_start and range_end are defined as integer types.

For a reasonable number of ip addresses, we could use an inline view:

 SELECT i.ip, a.*
   FROM (           SELECT 3232235520 AS ip
          UNION ALL SELECT 3232235521
          UNION ALL SELECT 3232235522
          UNION ALL SELECT 3232235523
          UNION ALL SELECT 3232235524
          UNION ALL SELECT 3232235525
        ) i
   LEFT 
   JOIN ip_to_asn a
     ON a.range_start <= i.ip
    AND a.range_end   >= i.ip
  ORDER BY i.ip

This approach will work for a reasonable number of IP addresses. The inline view could be extended with more UNION ALL SELECT to add additional IP addresses. But that's not necessarily going to work for a "huge" number.

When we get "huge", we're going to run into limitations in MySQL... maximum size of a SQL statement limited by max_allowed_packet, there may be a limit on the number of SELECT that can appear.

The inline view could be replaced with a temporary table, built first.

 DROP TEMPORARY TABLE IF EXISTS _ip_list_;
 CREATE TEMPORARY TABLE _ip_list_ (ip BIGINT NOT NULL PRIMARY KEY) ENGINE=InnoDB;
 INSERT INTO _ip_list_ (ip) VALUES (3232235520),(3232235521),(3232235522),...;
 ...
 INSERT INTO _ip_list_ (ip) VALUES (3232237989),(3232237990);

Then reference the temporary table in place of the inline view:

 SELECT i.ip, a.*
   FROM _ip_list_ i
   LEFT
   JOIN ip_to_asn a
     ON a.range_start <= i.ip
    AND a.range_end   >= i.ip
  ORDER BY i.ip ;

And then drop the temporary table:

 DROP TEMPORARY TABLE IF EXISTS _ip_list_ ;

Some other notes:

Churning database connections is going to degrade performance. There's a significant amount overhead in establishing and tearing down a connection. That overhead get noticeable if the application is repeatedly connecting and disconnecting, if its doing that for every SQL statement being issued.

And running an individual SQL statement also has overhead... the statement has to be sent to the server, the statement parsed for syntax, evaluated from semantics, choose an execution plan, execute the plan, prepare a resultset, return the resultset to the client. And this is why it's more efficient to process set wise rather than row wise. Processing RBAR (row by agonizing row) can be very slow, compared to sending a statement to the database and letting it process a set in one fell swoop.

But there's a tradeoff there. With ginormous sets, things can start to get slow again.

Even if you can process two IP addresses in each statement, that halves the number of statements that need to be executed. If you do 20 IP addresses in each statement, that cuts down the number of statements to 5% of the number that would be required a row at a time.


And the composite index already defined on (range_start,range_end) is appropriate for this query.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download