johnhenry - 1 year ago 108
Python Question

Python - Selecting elements that sum up to the 68% of the cumulative sum in a 2D list

I have my values in a two dimensional list.

a = [[5,2],[7,4],[0,3]]

Thanks to the answers to this question
sorting list of lists and getting indices in unsorted list
I managed to sort them in decreasing order and I got the coordinates of the values in the previous list.

from operator import itemgetter

b = [((i, j), v) for i, t in enumerate(a) for j, v in enumerate(t)]
b.sort(key=itemgetter(-1), reverse=True)
print(b)
coords, vals = zip(*b)
print(vals)
print(coords)

output:

[((1, 0), 7), ((0, 0), 5), ((1, 1), 4), ((2, 1), 3), ((0, 1), 2), ((2, 0), 0)]
(7, 5, 4, 3, 2, 0)
((1, 0), (0, 0), (1, 1), (2, 1), (0, 1), (2, 0))

Now I need to first of all calculate the total cumulative sum of the elements, which I perform by using

cumulative_sum = np.cumsum(a)

Then I need to start summing the ordered elements, until they reach a certain value, which is 68% of
cumulative_sum
. In this case, that would mean doing this logical operation:

1) cumulative_sum = 5+2+7+4+0+3 = 21

2) 68% of cumulative_sum = 14.28

3) then start summing: 7+5+4+... and select the relative elements, just until they exceed 14.28 (in this simplified example this would select only the values 7 and 5)

4) Of the selected elements, I need to get the coordinates in the array
a

(in this case,
(1,0)
and
(0,0)
)

This is easily done using a simple for loop to iterate over the (coordinate, value) pairs in b. The trick is to test if the sum would overflow the desired limit before we add the coordinate tuple t to the list of coordinates in selected.

from operator import itemgetter

a = [[5,2],[7,4],[0,3]]

b = [((i, j), v) for i, t in enumerate(a) for j, v in enumerate(t)]
b.sort(key=itemgetter(-1), reverse=True)
coords, vals = zip(*b)

total = sum(vals)
lim = int(0.68 * total)

selected = []
s = 0
for t, v in b:
if s + v <= lim:
s += v
selected.append(t)
else:
break

print(s, selected)

output

12 [(1, 0), (0, 0)]

The limit calculation could be written as

lim = 0.68 * total

but if your data is guaranteed to be integers then it's neater to have lim as an integer too, because comparing 2 integers is slightly more efficient than comparing an integer to a float. When you do an operation combining an int with a float (and that includes comparisons) they have to be converted to a common type to perform the operation.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download