Daniel Yang Daniel Yang - 1 month ago 6
Python Question

python data structure suggestion?

I have a complex pricing data, like a tree structure

Example will be like computers price (monitor price, motherboard price etc.) and in monitors category I have more sub-category, and under those sub-categories I have more categories (monitor which is 27 inch, which made by dell, which is curved)

I need to frequently read these pricing information (read only) like thousands of times.

I want to use class to store these information. Because I don't know if I can do it in dictionaries. Anyone have a suggestion?

Answer

Mongodb is definitely a good possibility but in your case with only 50 entries and only to read it's probably an overkill, especially as you will need to get familiar with how to do the queries.

A quicker way is most likely via pandas: use a nested dictionary, best create an input JSON file or string (as in the example below) and then read it in do a pandas dataframe.

You can then normalize it in the way you want it and do the calculations necessary in pandas, which you can learn much quicker:

Here an example of how it could look like: http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.io.json.json_normalize.html

>>> data = [{'state': 'Florida',
...          'shortname': 'FL',
...          'info': {
...               'governor': 'Rick Scott'
...          },
...          'counties': [{'name': 'Dade', 'population': 12345},
...                      {'name': 'Broward', 'population': 40000},
...                      {'name': 'Palm Beach', 'population': 60000}]},
...         {'state': 'Ohio',
...          'shortname': 'OH',
...          'info': {
...               'governor': 'John Kasich'
...          },
...          'counties': [{'name': 'Summit', 'population': 1234},
...                       {'name': 'Cuyahoga', 'population': 1337}]}]
>>> from pandas.io.json import json_normalize
>>> result = json_normalize(data, 'counties', ['state', 'shortname',
...                                           ['info', 'governor']])
>>> result
         name  population info.governor    state shortname
0        Dade       12345    Rick Scott  Florida        FL
1     Broward       40000    Rick Scott  Florida        FL
2  Palm Beach       60000    Rick Scott  Florida        FL
3      Summit        1234   John Kasich     Ohio        OH
4    Cuyahoga        1337   John Kasich     Ohio        OH

For the above dataframe you can easily get the sum of the population for all entries with shortname=='FL' as follows:

sum_of_fl_population = result[result['shortname']=='FL'].population.sum()
Out[11]: 112345

Have a look at this link to get an introduction how to handle the pandas dataframes. It's probably the best way to solve your problem. http://pandas.pydata.org/pandas-docs/stable/10min.html