Richard H Richard H - 1 month ago 18
Java Question

Get the second level domain of an URL (java)

I am wondering if there is a parser or library in java for extracting the second level domain (SLD) in an URL - or failing that an algo or regex for doing the same. For example:

URI uri = new URI("http://www.mydomain.ltd.uk/blah/some/page.html");

String host = uri.getHost();

System.out.println(host);


which prints:

mydomain.ltd.uk


Now what I'd like to do is robustly identify the SLD ("ltd.uk") component. Any ideas?

Edit: I'm ideally looking for a general solution, so I'd match ".uk" in "police.uk", ".co.uk" in "bbc.co.uk" and ".com" in "amazon.com".

Thanks

Answer

Don't know your purpose but Second-Level Domain may not mean much to you. You probably need to find public suffix and the domain right below it is what you are looking for.

Apache Http Component (HttpClient 4) comes with classes to handle this,

org.apache.http.impl.cookie.PublicSuffixFilter
org.apache.http.impl.cookie.PublicSuffixListParser

You need to download the public suffix list from here,

http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1