Below is the format of the database of Autonomous System Numbers ( download and parsed from this site! ).
range_start range_end number cc provider
----------- --------- ------ -- -------------------------------------
16778240 16778495 56203 AU AS56203 - BIGRED-NET-AU Big Red Group
16793600 16809983 18144 AS18144
select * from table where 3232235520 BETWEEN range_start AND range_end
where (IP between range_start and range_end) OR where (IP between range_start and range_end) OR ...
It is possible to query more than one IP address. Several approaches we could take. Assuming
range_end are defined as integer types.
For a reasonable number of ip addresses, we could use an inline view:
SELECT i.ip, a.* FROM ( SELECT 3232235520 AS ip UNION ALL SELECT 3232235521 UNION ALL SELECT 3232235522 UNION ALL SELECT 3232235523 UNION ALL SELECT 3232235524 UNION ALL SELECT 3232235525 ) i LEFT JOIN ip_to_asn a ON a.range_start <= i.ip AND a.range_end >= i.ip ORDER BY i.ip
This approach will work for a reasonable number of IP addresses. The inline view could be extended with more
UNION ALL SELECT to add additional IP addresses. But that's not necessarily going to work for a "huge" number.
When we get "huge", we're going to run into limitations in MySQL... maximum size of a SQL statement limited by
max_allowed_packet, there may be a limit on the number of
SELECT that can appear.
The inline view could be replaced with a temporary table, built first.
DROP TEMPORARY TABLE IF EXISTS _ip_list_; CREATE TEMPORARY TABLE _ip_list_ (ip BIGINT NOT NULL PRIMARY KEY) ENGINE=InnoDB; INSERT INTO _ip_list_ (ip) VALUES (3232235520),(3232235521),(3232235522),...; ... INSERT INTO _ip_list_ (ip) VALUES (3232237989),(3232237990);
Then reference the temporary table in place of the inline view:
SELECT i.ip, a.* FROM _ip_list_ i LEFT JOIN ip_to_asn a ON a.range_start <= i.ip AND a.range_end >= i.ip ORDER BY i.ip ;
And then drop the temporary table:
DROP TEMPORARY TABLE IF EXISTS _ip_list_ ;
Some other notes:
Churning database connections is going to degrade performance. There's a significant amount overhead in establishing and tearing down a connection. That overhead get noticeable if the application is repeatedly connecting and disconnecting, if its doing that for every SQL statement being issued.
And running an individual SQL statement also has overhead... the statement has to be sent to the server, the statement parsed for syntax, evaluated from semantics, choose an execution plan, execute the plan, prepare a resultset, return the resultset to the client. And this is why it's more efficient to process set wise rather than row wise. Processing RBAR (row by agonizing row) can be very slow, compared to sending a statement to the database and letting it process a set in one fell swoop.
But there's a tradeoff there. With ginormous sets, things can start to get slow again.
Even if you can process two IP addresses in each statement, that halves the number of statements that need to be executed. If you do 20 IP addresses in each statement, that cuts down the number of statements to 5% of the number that would be required a row at a time.
And the composite index already defined on
(range_start,range_end) is appropriate for this query.