passwd passwd - 1 year ago 75
Python Question

Match URLs by file path and GET parameters (but not their values)

How can I check if any of my list of URLs match the given

? I need URLs to match only if all GET parameter names (not their values) and the path are the same. For example, I have this list:

links = [

url = ""

This example is
. But how to match it in the most efficient way? I don't know what
will looks like.

Answer Source

You ideally want to use python's urlparse library. Parse your url like so:

import urlparse
url = ""
parsed_url = urlparse.urlparse(url)

Then you create a datastructure which looks something like this:

seen_pages = set() # Stores all pages you've already seen.

And then all your pages to it like so:

for page in list_of_pages:
    parsed_url = urlparse.urlparse(page)
    current_page = (parsed_url.path, frozenset(urlparse.parse_qs(parsed_url.query).keys())

This stores all your pages in the form: tuple(link, set(param1,param2)) in a set.

To look up if you've already visited the page, with those exact parameters, simply create the current_page structure again and look it up in the set. Look up and addition to a set is an O(1) operation, that is, it is as fast as you can get.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download