Sitz Blogz Sitz Blogz - 9 months ago 53
Python Question

urllib2.HTTPError: While Web Scraping a huge list

The web page has a huge list of journal names with other details. I am trying to scrape the table content into dataframe.


import bs4 as bs
import urllib #Using python 2.7
import pandas as pd

dfs = pd.read_html('', header=0)
for df in dfs:
df.to_csv('citefactor_list.csv', header=True)

But I am getting following error .. I did try referring to some already raised questions but could not fix.


Traceback (most recent call last):
File "", line 7, in <module>
dfs = pd.read_html('', header=0)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/", line 896, in read_html
File "/usr/local/lib/python2.7/dist-packages/pandas/io/", line 733, in _parse
File "/usr/local/lib/python2.7/dist-packages/pandas/io/", line 727, in _parse
tables = p.parse_tables()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/", line 196, in parse_tables
tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/", line 450, in _build_doc
return BeautifulSoup(self._setup_build_doc(), features='html5lib',
File "/usr/local/lib/python2.7/dist-packages/pandas/io/", line 443, in _setup_build_doc
raw_text = _read(
File "/usr/local/lib/python2.7/dist-packages/pandas/io/", line 130, in _read
with urlopen(obj) as url:
File "/usr/lib/python2.7/", line 17, in __enter__
File "/usr/local/lib/python2.7/dist-packages/pandas/io/", line 60, in urlopen
with closing(_urlopen(*args, **kwargs)) as f:
File "/usr/lib/python2.7/", line 127, in urlopen
return, data, timeout)
File "/usr/lib/python2.7/", line 410, in open
response = meth(req, response)
File "/usr/lib/python2.7/", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/", line 448, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 500: Internal Server Error

Answer Source

A 500 internal server error means something went wrong on the server and therefore is out of your control.

However the problem is that you are using the wrong URL.

If you go to in your browser you get a 404 not found error. Remove the trailing slash i.e. and it will work.