The Maestro - 1 year ago 55

Python Question

I have some calculations on biological data. Each function calculates the total, average, min, max values for one list of objects.

The idea is that I have a lot of different lists each one is for a different object type.

I don't want to repeat my code for every function just changing the "for" line and the call of the object's method!

For example:

Volume function:

`def calculate_volume(self):`

total = 0

min = sys.maxint

max = -1

compartments_counter = 0

for n in self.nodes:

compartments_counter += 1

current = n.get_compartment_volume()

if min > current:

min = current

if max < current:

max = current

total += current

avg = float(total) / compartments_counter

return total, avg, min, max

Contraction function:

`def get_contraction(self):`

total = 0

min = sys.maxint

max = -1

branches_count = self.branches.__len__()

for branch in self.branches:

current = branch.get_contraction()

if min > current:

min = current

if max < current:

max = current

total += current

avg = float(total) / branches_count

return total, avg, min, max

Both functions look almost the same, just a little modification!

I know I can use the sum, min, max, ... etc. but when I apply them for my values they take more time than doing them in the loop because they can't be called at once.

I just want to know if is it the right way to write a function for every calculation? (i.e. a professional way?) Or maybe I can write one function and pass the list, object type and the method to call.

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

It's hard to say without seeing the rest of the code but from the limited view given I'd reckon you shouldn't have these functions in methods at all. I also really don't understand your reasoning for not using the builtins("they can't be called at once?"). If you're implying that implementing the 4 statistical methods in a single pass in python is faster than 4 passes in builtin (C) then I'm afraid you have a very wrong assumption.

That said, here's my take on the problem:

```
def get_stats(l):
s = sum(l)
return (
s,
float(s) / len(l),
min(l),
max(l))
# then create numeric lists from your data and send 'em through:
node_volumes = [n.get_compartment_volume() for n in self.nodes]
branches = [b.get_contraction() for b in self.branches]
# ...
total_1, avg_1, min_1, max_1 = get_stats(node_volumes)
total_2, avg_2, min_2, max_2 = get_stats(branches)
```

EDIT

Some benchmarks to prove that builtin is win:

MINE.py

```
import sys
def get_stats(l):
s = sum(l)
return (
s,
float(s) / len(l),
min(l),
max(l)
)
branches = [i for i in xrange(10000000)]
print get_stats(branches)
```

Versus YOURS.py

```
import sys
branches = [i for i in xrange(10000000)]
total = 0
min = sys.maxint
max = -1
branches_count = branches.__len__()
for current in branches:
if min > current:
min = current
if max < current:
max = current
total += current
avg = float(total) / branches_count
print total, avg, min, max
```

And finally with some timers:

```
smassey@hacklabs:/tmp $ time python mine.py
(49999995000000, 4999999.5, 0, 9999999)
real 0m1.225s
user 0m0.996s
sys 0m0.228s
smassey@hacklabs:/tmp $ time python yours.py
49999995000000 4999999.5 0 9999999
real 0m2.369s
user 0m2.180s
sys 0m0.180s
```

Cheers

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**