user4943236 user4943236 - 3 months ago 125
Python Question

Pandas: read_html

I'm trying to extract US states from wiki URL, and for which I'm using Python Pandas.

import pandas as pd
import html5lib
f_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')


However, the above code is giving me an error L


ImportError Traceback (most recent call last)
in ()
1 import pandas as pd
----> 2 f_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')

if flavor in ('bs4', 'html5lib'):
662 if not _HAS_HTML5LIB:
--> 663 raise ImportError("html5lib not found, please install it")
664 if not _HAS_BS4:
665 raise ImportError("BeautifulSoup4 (bs4) not found, please install it")
ImportError: html5lib not found, please install it


I installed html5lib and beautifulsoup4 as well, but it is not working.
Can someone help pls.

Answer

Running Python 3.4 on a mac

New pyvenv

pip install pandas
pip install lxml
pip install html5lib
pip install BeautifulSoup4

Then ran your example....

import pandas as pd
import html5lib
f_states=   pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states') 

All works...