Acorn Acorn -4 years ago 137
Python Question

Getting unpredictable data into a tabular format

The situation:

Each page I scrape has

<input>
elements with a
title=
and a
value=


I don't know what is going to be on the page.

I want to have all my collected data in a single table at the end, with a column for each title.

So basically, I need each row of data to line up with all the others, and if a row doesn't have a certain element, then it should be blank (but there must be something there to keep the alignment).

eg.

First page has:
{animal: cat, colour: blue, fruit: lemon, day: monday}


Second page has:
{animal: fish, colour: green, day: saturday}


Third page has:
{animal: dog, number: 10, colour: yellow, fruit: mango, day: tuesday}


Then my resulting table should be:

animal | number | colour | fruit | day
cat | none | blue | lemon | monday
fish | none | green | none | saturday
dog | 10 | yellow | mango | tuesday


Although it would be good to keep the order of the
title
value
pairs, which I know dictionaries wont do.

So basically, I need to generate columns from all the
titles
(kept in order but somehow merged together)

What would be the best way of going about this without knowing all the possible titles and explicitly specifying an order for the values to be put in?

Answer Source

You need a multipass algorithm. Remember all the scraped pages in a list of dicts. In the first pass, go over this list and collect all the titles in a set(), and create an ordering (for example, convert to list sort them alphabetically).

In the second pass you print the table and use your generated ordering as column names, extracting the values from the dictionaries as needed (defaulting to empty to handle missing values), for example with dict.get(name, "").

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download