windboy windboy - 2 years ago 129
JSON Question

Correcting to the correct URL

I have written a simple script to access JSON to get the keywords needed to be used for the URL.

Below is the script that I have written:

import urllib2
import json

f1 = open('CatList.text', 'r')
f2 = open('SubList.text', 'w')
lines =

for line in lines:

url =''+line+'&cmlimit=100'
json_obj = urllib2.urlopen(url)
data = json.load(json_obj)
for item in data['query']:
for i in data['query']['categorymembers']:
print i['title']
print '-----------------------------------------'

In this script, the program will first read CatList which provides a list of keywords used for the URL.

Here is a sample of what the CatList.text contains.

Category:Branches of geography
Category:Geography by place
Category:Geography awards and competitions
Category:Geography conferences
Category:Geography education
Category:Environmental studies
Category:Geographical zones
Category:Geopolitical corridors
Category:History of geography
Category:Land systems
Category:Geography-related lists
Category:Lists of countries by geography
Category:Geography organizations
Category:Geographical regions
Category:Geographical technology
Category:Geography terminology
Category:Works about geography
Category:Geographic images
Category:Geography stubs

My program get the keywords and placed it in the URL.

However I am not able to get the result.I have checked the code by printing the URL:

import urllib2
import json

f1 = open('CatList.text', 'r')
f2 = open('SubList2.text', 'w')
lines =

for line in lines:

url =''+line+'&cmlimit=100'
json_obj = urllib2.urlopen(url)
data = json.load(json_obj)


The result I get is as follows in sublist2: of geography&cmlimit=100 by place&cmlimit=100 awards and competitions&cmlimit=100 conferences&cmlimit=100 education&cmlimit=100 studies&cmlimit=100 zones&cmlimit=100 corridors&cmlimit=100 of geography&cmlimit=100 systems&cmlimit=100 lists&cmlimit=100 of countries by geography&cmlimit=100 organizations&cmlimit=100 regions&cmlimit=100 technology&cmlimit=100 terminology&cmlimit=100 about geography&cmlimit=100 images&cmlimit=100 stubs&cmlimit=100

It shows that the URL is placed correctly.

But when I run the full code it was not able to get the correct result.

One thing I notice is when I place in the link to the address bar for example: of geography&cmlimit=100

It gives the correct result because the address bar auto corrects it to :

I believe that if %20 is added in place of an empty space between the word " Category: Branches of Geography" , my script will be able to get the correct JSON items.

But I am not sure how to modify this statement in the above code to get the replace the blank spaces that is contained in CatList with %20.

Please forgive me for the bad formatting and the long post, I am still trying to learn python.

Thank you for helping me.


Thank you Tim. Your solution works:

url =''+urllib2.quote(line)+'&cmlimit=100'

It was able to print the correct result:

Answer Source

use urllib.quote() to replace special characters in an url:

Python 2:

import urllib
line = 'Category:Branches of geography'
url ='' + urllib.quote(line) + '&cmlimit=100'

Python 3:

import urllib.parse
line = 'Category:Branches of geography'
url ='' + urllib.parse.quote(line) + '&cmlimit=100'

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download