The Maestro - 9 months ago 23

Python Question

I have some calculations on biological data. Each function calculates the total, average, min, max values for one list of objects.

The idea is that I have a lot of different lists each one is for a different object type.

I don't want to repeat my code for every function just changing the "for" line and the call of the object's method!

For example:

Volume function:

`def calculate_volume(self):`

total = 0

min = sys.maxint

max = -1

compartments_counter = 0

for n in self.nodes:

compartments_counter += 1

current = n.get_compartment_volume()

if min > current:

min = current

if max < current:

max = current

total += current

avg = float(total) / compartments_counter

return total, avg, min, max

Contraction function:

`def get_contraction(self):`

total = 0

min = sys.maxint

max = -1

branches_count = self.branches.__len__()

for branch in self.branches:

current = branch.get_contraction()

if min > current:

min = current

if max < current:

max = current

total += current

avg = float(total) / branches_count

return total, avg, min, max

Both functions look almost the same, just a little modification!

I know I can use the sum, min, max, ... etc. but when I apply them for my values they take more time than doing them in the loop because they can't be called at once.

I just want to know if is it the right way to write a function for every calculation? (i.e. a professional way?) Or maybe I can write one function and pass the list, object type and the method to call.

Answer

It's hard to say without seeing the rest of the code but from the limited view given I'd reckon you shouldn't have these functions in methods at all. I also really don't understand your reasoning for not using the builtins("they can't be called at once?"). If you're implying that implementing the 4 statistical methods in a single pass in python is faster than 4 passes in builtin (C) then I'm afraid you have a very wrong assumption.

That said, here's my take on the problem:

```
def get_stats(l):
s = sum(l)
return (
s,
float(s) / len(l),
min(l),
max(l))
# then create numeric lists from your data and send 'em through:
node_volumes = [n.get_compartment_volume() for n in self.nodes]
branches = [b.get_contraction() for b in self.branches]
# ...
total_1, avg_1, min_1, max_1 = get_stats(node_volumes)
total_2, avg_2, min_2, max_2 = get_stats(branches)
```

EDIT

Some benchmarks to prove that builtin is win:

MINE.py

```
import sys
def get_stats(l):
s = sum(l)
return (
s,
float(s) / len(l),
min(l),
max(l)
)
branches = [i for i in xrange(10000000)]
print get_stats(branches)
```

Versus YOURS.py

```
import sys
branches = [i for i in xrange(10000000)]
total = 0
min = sys.maxint
max = -1
branches_count = branches.__len__()
for current in branches:
if min > current:
min = current
if max < current:
max = current
total += current
avg = float(total) / branches_count
print total, avg, min, max
```

And finally with some timers:

```
smassey@hacklabs:/tmp $ time python mine.py
(49999995000000, 4999999.5, 0, 9999999)
real 0m1.225s
user 0m0.996s
sys 0m0.228s
smassey@hacklabs:/tmp $ time python yours.py
49999995000000 4999999.5 0 9999999
real 0m2.369s
user 0m2.180s
sys 0m0.180s
```

Cheers