McLeodx McLeodx - 5 months ago 27
Python Question

Extract domain name only from url, getting rid of the path (Python)

I've been trying to extract the domain names from a list of urls, so that

http://supremecosts.com/contact-us/
would become
http://supremecosts.com
. I'm trying to find a clean way of doing it that will be adaptable to various gtlds and cctlds.

Answer

You can do it using regex like this:

import re

text = 'http://supremecosts.com/contact-us/'

m = re.search('(https?:\/\/[^:\/\n]+)', text)
if m:
    print(m.group(1))

Working example