asdfg asdfg -4 years ago 229
Python Question

Python In-memory table

What is the right way to forming in-memory table in python with direct lookups for rows and columns.
I thought of using dict of dicts this way,

class Table(dict):
def __getitem__(self, key):
if key not in self:
return dict.__getitem__(self, key)
table = Table()
table['row1']['column1'] = 'value11'
table['row1']['column2'] = 'value12'
table['row2']['column1'] = 'value21'
table['row2']['column2'] = 'value22'

I had difficulty in looking up for values in columns.

>>>'row1' in table
>>>'value11' in table['row1'].values()

Now how do I do lookup if

Is this method of forming tables wrong?
Is there a better way to implement such tables with easier lookups?.

Answer Source

Now how do I do lookup if 'column1' has 'value11'

any(arow['column1'] == 'value11' for arow in table.iteritems())

Is this method of forming tables wrong?

No, it's just very "exposed", perhaps too much -- it could usefully be encapsulated in a class which exposes the methods you need, then the issue of how best to implement them does not affect all the rest of your application.

Is there a better way to implement such tables with easier lookups?

Once you have designed a class whose interface you'd like to use, you can experiment with very different implementation approaches and benchmark them on a workload that's representative of your usage pattern, so you can find out what's best for you (assuming table manipulation and lookup are a big part of your application's runtime, of course -- to find out, profile your app).

I had similar but not identical needs in a large internal app I maintain at work, except that the row indices are integer (only the column names are strings), the column order is important, and the workload is more about "editing" the table (adding, removing, reordering rows or columns, renaming columns, etc). I started with a table exposing the functionality I needed, with the simplest rough-and-ready implementation internally (a list of dicts, plus a list of column names for the column ordering); and by now I have evolved it (independently of the actual "application-level" parts, but based on profiling and benchmarking thereof) to completely different implementations (currently based on numpy).

I think you should proceed along similar lines: "clothe" your current implementation into a nice "interface" with all the methods you need, profile your app -- unless this table object is a performance bottleneck, you're done; if it is a bottleneck, you can optimize the implementation (experiment, measure, repeat;-) without disturbing any of the rest of your application.

Inheriting from dict is not a good idea because you probably don't want to expose all of dict's rich functionality; plus, what you're doing is, roughly, an inefficient implementation of collections.defaultdict(dict). So, encapsulate the latter:

import collections

class Table(object):
    def __init__(self):
        self.d = collections.defaultdict(dict)
    def add(self, row, col, val):
        self.d[row][col] = val
    def get(self, row, col, default=None):
        return self.d[row].get(col, default)
    def inrow(self, row, col):
        return col in self.d[row]
    def incol(self, col, val):
        return any(x[col]==val for x in self.d.iteritems())

etc, etc -- write all the methods your app needs, with useful, short names, then maybe see if you can alias some of them as special methods if they're often used that way, e.g maybe (assuming Python 2.* -- requires slightly different syntax in 3.*):

    def __setitem__(self, (row, col), val):
        self.add(row, col, val)

and so forth. Once you have the code working, then comes the right time for profiling, benchmarking, and -- just perhaps -- internal optimization of the implementation.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download