Kunal Saxena Kunal Saxena - 7 months ago 18
Python Question

Parsing URL for scraping tasks

I want to parse Some URL and visit the site and further scrap some data.

Basically my current code is :

i = 9
while(i < 118):
dict = {'start': i}
url1 = urllib.urlencode(dict)
url2 = urlparse.urljoin('http://intelligencesquaredus.org/debates/past-debates ', url1)
print url2
i = i + 9


Which yields results

http://intelligencesquaredus.org/debates/past-debates/start=9
http://intelligencesquaredus.org/debates/past-debates/start=18
http://intelligencesquaredus.org/debates/past-debates/start=27


But i want the link to be

http://intelligencesquaredus.org/debates/past-debates?start=9

Any help would be appreciated.
Thanks in advance

Answer

Use :

url2=('?'.join(('http://intelligencesquaredus.org/debates/past-debates '+url1).split(' ')))

In the above snippet, your are taking the url as a string and adding the required string to it(url1).

Then you split by space to get a list with two elements which you later join with ?.

Alternative (suggested by deloz) :

base_url = "http://intelligencesquaredus.org/debates/past-debates"
for a in([''.join((base_url, '?', 'start=', str(i))) for i in range(9, 118, 9)]):
    print(a)
Comments