tomwu tomwu - 1 year ago 193
Python Question

Parsing a Wikipedia dump

For example using this Wikipedia dump:

Is there an existing library for Python that I can use to create an array with the mapping of subjects and values?

For example:

{height_ft,6},{nationality, American}

Answer Source

It looks like you really want to be able to parse MediaWiki markup. There is a python library designed for this purpose called mwlib. You can use python's built-in XML packages to extract the page content from the API's response, then pass that content into mwlib's parser to produce an object representation that you can browse and analyse in code to extract the information you want. mwlib is BSD licensed.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download