hasam hasam - 21 days ago 4
Python Question

Sort and group by same key once

I want to group a list with urls by its TLDs

My code looks like this:

from itertools import groupby
from tldextract import extract

urls = sorted(urls, key=lambda x: extract(x).suffix)
grouped_urls = groupby(urls, key=lambda x: extract(x).suffix)


The problem is that I call method
extract
2*n times(
where n == len(urls)
), first n times when sorting, and second n times when grouping.
Is it possible to make it n times?

Answer

If you first add the suffix as a tuple, you can then sort and groupby without needing to recompute it as follows:

from itertools import groupby
from tldextract import extract

urls = ["www.example.com", "www.mytest.org", "www.test.com", "www.abc.com"]
urls = [(extract(url).suffix, url) for url in urls]

for k, g in groupby(sorted(urls), key=lambda x: x[0]):
    print k, list(g)

In this example you would get:

com [('com', 'www.abc.com'), ('com', 'www.example.com'), ('com', 'www.test.com')]
org [('org', 'www.mytest.org')]
Comments