D. Wang D. Wang - 3 months ago 38
Python Question

Issues with Python pandas: read_html and python3-lxml installation

I'm trying to run the following code, to no avail. To my knowledge, there aren't any syntax errors.

import quandl
import pandas as pd

fifty_states =pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
print(fifty_states)


I'm getting the following error when I run this code:


Traceback (most recent call last):

File "C:/Users/Dave/Documents/Python Files/helloworld.py", line 15, in
fiddy_states = pd.read_html('http://simple.wikipedia.org/wiki/List_of_U.S._states')

File "C:\Python35\lib\site-packages\pandas\io\html.py", line 874, in read_html
parse_dates, tupleize_cols, thousands, attrs, encoding)

File "C:\Python35\lib\site-packages\pandas\io\html.py", line 726, in _parse
parser = _parser_dispatch(flav)

File "C:\Python35\lib\site-packages\pandas\io\html.py", line 685, in _parser_dispatch
raise ImportError("lxml not found, please install it")

ImportError: lxml not found, please install it


Not too sure why this is occurring, as I (should) have all the packages required to run this code. I have problems installing lxml and python3-lxml, as the packages fail to install. As a backup, I've installed the following:


python-dev libxml2-dev libxslt1-dev zlib1g-dev


in addition to 'html5lib', which I've read is a suitable replacement to lxml.

Not sure what else to do at this point, since searching for similar corrections (i.e. installing lxml) don't apply to me (I can't install lxml in any format via pip on the command line).

Any help is much appreciated.

Edit: It appears that
lxml
was never installed on my computer. It's weird, because I'm unable to install it via
pip install lxml
. Here're the error logs I get when attempting an install:

Collecting lxml
Using cached lxml-3.6.4.tar.gz
Building wheels for collected packages: lxml
Running setup.py bdist_wheel for lxml ... error
Complete output from command c:\python35\python.exe -u -c "import setuptools,
tokenize;__file__='C:\\Users\\Dwang\\AppData\\Local\\Temp\\pip-build-738bf61u\\l
xml\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().rep
lace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d C:\Users\Dwang\AppData\Lo
cal\Temp\tmpm9z4yol6pip-wheel- --python-tag cp35:
Building lxml version 3.6.4.
Building without Cython.
ERROR: b"'xslt-config' is not recognized as an internal or external command,\r
\noperable program or batch file.\r\n"
** make sure the development packages of libxml2 and libxslt are installed **

Using build configuration of libxslt
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-3.5
creating build\lib.win-amd64-3.5\lxml
copying src\lxml\builder.py -> build\lib.win-amd64-3.5\lxml
copying src\lxml\cssselect.py -> build\lib.win-amd64-3.5\lxml
copying src\lxml\doctestcompare.py -> build\lib.win-amd64-3.5\lxml
copying src\lxml\ElementInclude.py -> build\lib.win-amd64-3.5\lxml
copying src\lxml\pyclasslookup.py -> build\lib.win-amd64-3.5\lxml
copying src\lxml\sax.py -> build\lib.win-amd64-3.5\lxml
copying src\lxml\usedoctest.py -> build\lib.win-amd64-3.5\lxml
copying src\lxml\_elementpath.py -> build\lib.win-amd64-3.5\lxml
copying src\lxml\__init__.py -> build\lib.win-amd64-3.5\lxml
creating build\lib.win-amd64-3.5\lxml\includes
copying src\lxml\includes\__init__.py -> build\lib.win-amd64-3.5\lxml\includes

creating build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\builder.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\clean.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\defs.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\diff.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\ElementSoup.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\formfill.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\html5parser.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\soupparser.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\usedoctest.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\_diffcommand.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\_html5builder.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\_setmixin.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\__init__.py -> build\lib.win-amd64-3.5\lxml\html
creating build\lib.win-amd64-3.5\lxml\isoschematron
copying src\lxml\isoschematron\__init__.py -> build\lib.win-amd64-3.5\lxml\iso
schematron
copying src\lxml\lxml.etree.h -> build\lib.win-amd64-3.5\lxml
copying src\lxml\lxml.etree_api.h -> build\lib.win-amd64-3.5\lxml
copying src\lxml\includes\c14n.pxd -> build\lib.win-amd64-3.5\lxml\includes
copying src\lxml\includes\config.pxd -> build\lib.win-amd64-3.5\lxml\includes
copying src\lxml\includes\dtdvalid.pxd -> build\lib.win-amd64-3.5\lxml\include
s
copying src\lxml\includes\etreepublic.pxd -> build\lib.win-amd64-3.5\lxml\incl
udes
copying src\lxml\includes\htmlparser.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
copying src\lxml\includes\relaxng.pxd -> build\lib.win-amd64-3.5\lxml\includes

copying src\lxml\includes\schematron.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
copying src\lxml\includes\tree.pxd -> build\lib.win-amd64-3.5\lxml\includes
copying src\lxml\includes\uri.pxd -> build\lib.win-amd64-3.5\lxml\includes
copying src\lxml\includes\xinclude.pxd -> build\lib.win-amd64-3.5\lxml\include
s
copying src\lxml\includes\xmlerror.pxd -> build\lib.win-amd64-3.5\lxml\include
s
copying src\lxml\includes\xmlparser.pxd -> build\lib.win-amd64-3.5\lxml\includ
es
copying src\lxml\includes\xmlschema.pxd -> build\lib.win-amd64-3.5\lxml\includ
es
copying src\lxml\includes\xpath.pxd -> build\lib.win-amd64-3.5\lxml\includes
copying src\lxml\includes\xslt.pxd -> build\lib.win-amd64-3.5\lxml\includes
copying src\lxml\includes\etree_defs.h -> build\lib.win-amd64-3.5\lxml\include
s
copying src\lxml\includes\lxml-version.h -> build\lib.win-amd64-3.5\lxml\inclu
des
creating build\lib.win-amd64-3.5\lxml\isoschematron\resources
creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\rng
copying src\lxml\isoschematron\resources\rng\iso-schematron.rng -> build\lib.w
in-amd64-3.5\lxml\isoschematron\resources\rng
creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl
copying src\lxml\isoschematron\resources\xsl\RNG2Schtrn.xsl -> build\lib.win-a
md64-3.5\lxml\isoschematron\resources\xsl
copying src\lxml\isoschematron\resources\xsl\XSD2Schtrn.xsl -> build\lib.win-a
md64-3.5\lxml\isoschematron\resources\xsl
creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schematr
on-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_abstract
_expand.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-sche
matron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_dsdl_inc
lude.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schemat
ron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schematr
on_message.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-s
chematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schematr
on_skeleton_for_xslt1.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resource
s\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_svrl_for
_xslt1.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schem
atron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\readme.txt -
> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schematron-xslt1
running build_ext
building 'lxml.etree' extension
error: Unable to find vcvarsall.bat

----------------------------------------
Failed building wheel for lxml
Running setup.py clean for lxml
Failed to build lxml
Installing collected packages: lxml
Running setup.py install for lxml ... error
Complete output from command c:\python35\python.exe -u -c "import setuptools
, tokenize;__file__='C:\\Users\\Dwang\\AppData\\Local\\Temp\\pip-build-738bf61u\
\lxml\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().r
eplace('\r\n', '\n'), __file__, 'exec'))" install --record C:\Users\Dwang\AppDat
a\Local\Temp\pip-4_tf2u3a-record\install-record.txt --single-version-externally-
managed --compile:
Building lxml version 3.6.4.
Building without Cython.
ERROR: b"'xslt-config' is not recognized as an internal or external command,
\r\noperable program or batch file.\r\n"
** make sure the development packages of libxml2 and libxslt are installed *
*

Using build configuration of libxslt
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.5
creating build\lib.win-amd64-3.5\lxml
copying src\lxml\builder.py -> build\lib.win-amd64-3.5\lxml
copying src\lxml\cssselect.py -> build\lib.win-amd64-3.5\lxml
copying src\lxml\doctestcompare.py -> build\lib.win-amd64-3.5\lxml
copying src\lxml\ElementInclude.py -> build\lib.win-amd64-3.5\lxml
copying src\lxml\pyclasslookup.py -> build\lib.win-amd64-3.5\lxml
copying src\lxml\sax.py -> build\lib.win-amd64-3.5\lxml
copying src\lxml\usedoctest.py -> build\lib.win-amd64-3.5\lxml
copying src\lxml\_elementpath.py -> build\lib.win-amd64-3.5\lxml
copying src\lxml\__init__.py -> build\lib.win-amd64-3.5\lxml
creating build\lib.win-amd64-3.5\lxml\includes
copying src\lxml\includes\__init__.py -> build\lib.win-amd64-3.5\lxml\includ
es
creating build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\builder.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\clean.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\defs.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\diff.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\ElementSoup.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\formfill.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\html5parser.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\soupparser.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\usedoctest.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\_diffcommand.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\_html5builder.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\_setmixin.py -> build\lib.win-amd64-3.5\lxml\html
copying src\lxml\html\__init__.py -> build\lib.win-amd64-3.5\lxml\html
creating build\lib.win-amd64-3.5\lxml\isoschematron
copying src\lxml\isoschematron\__init__.py -> build\lib.win-amd64-3.5\lxml\i
soschematron
copying src\lxml\lxml.etree.h -> build\lib.win-amd64-3.5\lxml
copying src\lxml\lxml.etree_api.h -> build\lib.win-amd64-3.5\lxml
copying src\lxml\includes\c14n.pxd -> build\lib.win-amd64-3.5\lxml\includes
copying src\lxml\includes\config.pxd -> build\lib.win-amd64-3.5\lxml\include
s
copying src\lxml\includes\dtdvalid.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
copying src\lxml\includes\etreepublic.pxd -> build\lib.win-amd64-3.5\lxml\in
cludes
copying src\lxml\includes\htmlparser.pxd -> build\lib.win-amd64-3.5\lxml\inc
ludes
copying src\lxml\includes\relaxng.pxd -> build\lib.win-amd64-3.5\lxml\includ
es
copying src\lxml\includes\schematron.pxd -> build\lib.win-amd64-3.5\lxml\inc
ludes
copying src\lxml\includes\tree.pxd -> build\lib.win-amd64-3.5\lxml\includes
copying src\lxml\includes\uri.pxd -> build\lib.win-amd64-3.5\lxml\includes
copying src\lxml\includes\xinclude.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
copying src\lxml\includes\xmlerror.pxd -> build\lib.win-amd64-3.5\lxml\inclu
des
copying src\lxml\includes\xmlparser.pxd -> build\lib.win-amd64-3.5\lxml\incl
udes
copying src\lxml\includes\xmlschema.pxd -> build\lib.win-amd64-3.5\lxml\incl
udes
copying src\lxml\includes\xpath.pxd -> build\lib.win-amd64-3.5\lxml\includes

copying src\lxml\includes\xslt.pxd -> build\lib.win-amd64-3.5\lxml\includes
copying src\lxml\includes\etree_defs.h -> build\lib.win-amd64-3.5\lxml\inclu
des
copying src\lxml\includes\lxml-version.h -> build\lib.win-amd64-3.5\lxml\inc
ludes
creating build\lib.win-amd64-3.5\lxml\isoschematron\resources
creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\rng
copying src\lxml\isoschematron\resources\rng\iso-schematron.rng -> build\lib
.win-amd64-3.5\lxml\isoschematron\resources\rng
creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl
copying src\lxml\isoschematron\resources\xsl\RNG2Schtrn.xsl -> build\lib.win
-amd64-3.5\lxml\isoschematron\resources\xsl
copying src\lxml\isoschematron\resources\xsl\XSD2Schtrn.xsl -> build\lib.win
-amd64-3.5\lxml\isoschematron\resources\xsl
creating build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schema
tron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_abstra
ct_expand.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-sc
hematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_dsdl_i
nclude.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schem
atron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schema
tron_message.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso
-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_schema
tron_skeleton_for_xslt1.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resour
ces\xsl\iso-schematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\iso_svrl_f
or_xslt1.xsl -> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-sch
ematron-xslt1
copying src\lxml\isoschematron\resources\xsl\iso-schematron-xslt1\readme.txt
-> build\lib.win-amd64-3.5\lxml\isoschematron\resources\xsl\iso-schematron-xslt
1
running build_ext
building 'lxml.etree' extension
error: Unable to find vcvarsall.bat

----------------------------------------
Command "c:\python35\python.exe -u -c "import setuptools, tokenize;__file__='C:\
\Users\\Dwang\\AppData\\Local\\Temp\\pip-build-738bf61u\\lxml\\setup.py';exec(co
mpile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __
file__, 'exec'))" install --record C:\Users\Dwang\AppData\Local\Temp\pip-4_tf2u3
a-record\install-record.txt --single-version-externally-managed --compile" faile
d with error code 1 in C:\Users\Dwang\AppData\Local\Temp\pip-build-738bf61u\lxml
\

Answer

From what I understand and according to the docs, if read_html() fails to use lxml, it should fall back to html5lib, but it looks ike it does not happen in your case and an error is thrown.

Try to explicitly state the flavor:

fifty_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states', flavor='html5lib`)
Comments