damola damola - 5 months ago 11
Perl Question

Matching the TLD and file extension from the URL

I am working on a program and need to extract TLD and web page extension from the URL

E.g:

http://www.example.com/somedir/someotherdir/index.html
should give me TLD
.com
and Extension
Html


While this:
http://www.example.com.au/somedir/someotherdir/index/
should give me TLD
.com.au
and Extension
null


Is there any way I can do this with Regex in Perl? I am using the URI module in Perl but It cannot seem to do this Type of extraction.

Answer

If you're using the URI module, you can easily extract the host and path. Then it's a simple matter of taking everything after the last dot, or conversely removing everything up to and including the last dot. You may want to get more complicated for the extension, to properly handle cases where there is no extension.

($tld = $uri->host) =~ s/.*\.//;

($extension = $uri->path) =~ s/.*\///;
$extension = '' unless $extension =~ s/.*\.//;
Comments