spraff spraff - 1 month ago 8
MySQL Question

MySQL locate or remove string suffix from a known set

I want to write a MySQL stored procedure which will split a FQDN into host/authority/tld parts.

Let's say I have a list of known TLDs, and for the sake of illustration let's say it's the set

com
co.uk
uk


let's test it against these strings

input | output
----------------|-------
alpha.co.uk | alpha
mail.beta.uk | mail.beta


The output is the shortest substring of the input, starting from the beginning, such that
CONCAT(output,'.',tld)=input
for some
tld
which is a member of the given set.

Note that we need the shortest substring as the output, otherwise the output would be
alpha.co
in the first case, which is wrong.

I know how to write a MySQL function which tells me whether a given string is the suffix of another string, but here there are many possible such strings and any will do (provided no longer string is also a suffix of the input).

I know I could write a regex along the lines of
co\.uk|uk|com
but MySQL
REGEX
operator does not return the position of the match, just whether it matches or not.

Yes, I really do want a solution in SQL for this, not in the application language.

What's the best way to locate or remove the longest possible suffix, given a set of valid suffixes?

Answer

Here's one way to do that, relying on the fact that MIN() will yield the shortest of all the matches:

create table tld (tld varchar(100));
create table input (input varchar(100));

insert into tld values ('com'),('co.uk'),('uk');
insert into input values ('alpha.co.uk'),('mail.beta.com');

select 
    input.input as input, 
    min(substring(input.input, 1, length(input.input) - length(tld.tld) - 1)) as output 
from input inner join tld 
on input.input like concat('%.', tld.tld) group by input.input;

OR, if you only have a single value for input, then:

set @input = 'alpha.co.uk';

select min(substring(@input, 1, length(@input) - length(tld.tld) - 1)) as output
from tld
where @input like concat('%.', tld.tld);