The Maestro The Maestro - 6 months ago 10
Python Question

Unifying similar functions in one

I have some calculations on biological data. Each function calculates the total, average, min, max values for one list of objects.
The idea is that I have a lot of different lists each one is for a different object type.
I don't want to repeat my code for every function just changing the "for" line and the call of the object's method!

For example:

Volume function:

def calculate_volume(self):
total = 0
min = sys.maxint
max = -1
compartments_counter = 0

for n in self.nodes:

compartments_counter += 1
current = n.get_compartment_volume()
if min > current:
min = current
if max < current:
max = current

total += current

avg = float(total) / compartments_counter
return total, avg, min, max


Contraction function:

def get_contraction(self):
total = 0
min = sys.maxint
max = -1
branches_count = self.branches.__len__()

for branch in self.branches:

current = branch.get_contraction()
if min > current:
min = current
if max < current:
max = current

total += current

avg = float(total) / branches_count

return total, avg, min, max


Both functions look almost the same, just a little modification!

I know I can use the sum, min, max, ... etc. but when I apply them for my values they take more time than doing them in the loop because they can't be called at once.

I just want to know if is it the right way to write a function for every calculation? (i.e. a professional way?) Or maybe I can write one function and pass the list, object type and the method to call.

Answer

It's hard to say without seeing the rest of the code but from the limited view given I'd reckon you shouldn't have these functions in methods at all. I also really don't understand your reasoning for not using the builtins("they can't be called at once?"). If you're implying that implementing the 4 statistical methods in a single pass in python is faster than 4 passes in builtin (C) then I'm afraid you have a very wrong assumption.

That said, here's my take on the problem:

def get_stats(l):
    s = sum(l)
    return (
        s,
        float(s) / len(l),
        min(l),
        max(l))

# then create numeric lists from your data and send 'em through:

node_volumes = [n.get_compartment_volume() for n in self.nodes]
branches = [b.get_contraction() for b in self.branches]

# ...

total_1, avg_1, min_1, max_1 = get_stats(node_volumes)
total_2, avg_2, min_2, max_2 = get_stats(branches)

EDIT

Some benchmarks to prove that builtin is win:

MINE.py

import sys

def get_stats(l):
    s = sum(l)
    return (
        s,
        float(s) / len(l),
        min(l),
        max(l)
    )


branches = [i for i in xrange(10000000)]

print get_stats(branches)

Versus YOURS.py

import sys

branches = [i for i in xrange(10000000)]

total = 0
min = sys.maxint
max = -1
branches_count = branches.__len__()

for current in branches:
    if min > current:
        min = current
    if max < current:
        max = current

    total += current

avg = float(total) / branches_count

print total, avg, min, max

And finally with some timers:

smassey@hacklabs:/tmp $ time python mine.py 
(49999995000000, 4999999.5, 0, 9999999)

real    0m1.225s
user    0m0.996s
sys 0m0.228s
smassey@hacklabs:/tmp $ time python yours.py 
49999995000000 4999999.5 0 9999999

real    0m2.369s
user    0m2.180s
sys 0m0.180s

Cheers