Tommy Gaboreau Tommy Gaboreau - 2 months ago 13
Python Question

Dictionary of dictionaries vs dictionary of class instances

I understand what a class is, a bundle of attributes and methods stored together in one object. However, i don't think i have ever really grasped their full power. I taught myself to manipulate large volumes of data by using 'dictionary of dictionary' data structures. I'm now thinking if i want to fit in with the rest of the world then i need to implement classes in my code, but i just don't get how to make the transition.

I have a script which gets information about sales orders from a SQL query, performs operations on the data, and outputs it to a csv.

1) (the way i currently do it, store all the orders in a dictionary of dictionaries)

cursor.execute(querystring)

# create empty dictionary to hold orders
orders = {}

# define list of columns returned by query
columns = [col[0] for col in cursor.description]

for row in cursor:
# create dictionary of dictionaries where key is order_id
# this allows easy access of attributes given the order_id
orders[row.order_id] = {}
for i, v in enumerate(columns):
# add each field to each order
orders[row.order_id][v] = row[i]

# example operation
for order, fields in orders.iteritems():
fields['long'], fields['lat'] = getLongLat(fields['post_code'])

# example of another operation
cancelled_orders = getCancelledOrders()
for order_id in cancelled_orders:
orders[order_id]['status'] = 'cancelled'

# Other similar operations go here...

# write to file here...


2) (the way i THINK i would do it if i was using classes)

class salesOrder():


def __init__(self, cursor_row):
for i, v in enumerate(columns):
setattr(self, v, cursor_row[i])


def getLongLat(self, long_lat_dict):
self.long, self.lat = long_lat_dict[self.post_code]['long'], long_lat_dict[self.post_code]['lat']


def cancelOrder(self):
self.status = 'cancelled'


# more methods here


cursor.execute(querystring)

# create empty dictionary to hold orders
orders = {}

# define list of columns returned by query
columns = [col[0] for col in cursor.description]

for row in cursor:
orders[row.order_id] = salesOrder(row)
orders[row.order_id].getLongLat()

# example of another operation
cancelled_orders = getCancelledOrders()
for order_id in cancelled_orders:
orders[order_id].cancelOrder()

# other similar operations go here

# write to file here


I just get the impression that i'm not quite understanding the best way to use classes. Have i got the complete wrong idea about how to use classes? Is there some sense to what i'm doing but it needs refactoring? or am i trying to use classes for the wrong purpose?

Answer

I am trying to guess what you are trying to do since I have no idea what your "row" looks like. I assume you have the variable columns which is a list of column names. If that is the case, please consider this code snippet:

class SalesOrder(object):
    def __init__(self, columns, row):
        """ Transfer all the columns from row to this object """
        for name in columns:
            value = getattr(row, name)
            setattr(self, name, value)
        self.long, self.lat = getLongLat(self.post_code)

    def cancel(self):
        self.status = 'cancelled'

    def as_row(self):
        return [getattr(self, name) for name in columns]

    def __repr__(self):
        return repr(self.as_row())

# Create the dictionary of class
orders = {row.order_id: SalesOrder(columns, row) for row in cursor}

# Cancel
cancelled_orders = getCancelledOrders()
for order_id in cancelled_orders:
    orders[order_id].cancel()

# Print all sales orders
for sales_order in orders.itervalues():
    print(sales_order)

At the lowest level, we need to be able to create a new SalesOrder object from the row object by copying all the attributes listed in columns over. When initializing a SalesOrder object, we also calculate the longitude and latitude as well.

With that, the task of creating the dictionary of class objects become easier:

orders = {row.order_id: SalesOrder(columns, row) for row in cursor}

Our orders is a dictionary with order_id as keys and SalesOrder as values. Finally, the task up cancelling the orders is the same as your code.

In addition to what you have, I created a method called as_row() which is handy if later you wish to write a SalesOrder object into a CSV or database. For now, I use it to display the "raw" row. Normally, the print statement/function will invoke the __str__() method to get a string presentation for an object, if not found, it will attempt to invoke the __repr__() method, which is what we have here.

Comments