James James - 8 months ago 23
Python Question

How do I telescope the columns of a numpy array?

I have a numpy array and want to "telescope" the values based on the top row. An example is the best way to describe it

Start array:

9 9 8 7 7 7 6
1 2 3 4 5 6 3
3 4 5 6 7 6 3
5 6 7 8 9 6 4

desired output array:

9 8 7 6
3 3 15 3
7 5 19 3
11 7 23 4

The idea is to unique-ify the top row and sum values along the subsequent rows grouped by value in the top row. The top row will be sorted and the array will be about 2000 cells wide and 200,000 cells long. There could be any number of consecutive identical numbers in the top row. My current hack is this (slightly different top row labels in the example and I am printing to screen rather than creating the final array to check the output. Plan is to stack the output to generate the output array)

import numpy as N
for i in range(1,len(kk[0])):
if kk[0][i]==kk[0][i-1]:
elif kk[0][i]!=kk[0][i-1]:
print "sum=", ll, i,kk[0][i],kk[0][i-1]

There are two defects. The major one is that it isn't dealing with the final column and I don't see why. The minor one is that it is summing the top row too. It's obvious why this minor one is happening. I suspect I can cludge my way around that one but the failure to deal with the final column has been frustrating me for a while and I'd really appreciate any suggestions for dealing with it.

thanks for any help


You should use the unique function from numpy

import numpy as np

a = np.array([[90,90,85,80,80,80,70],[1,2,3,4,5,6,3],[3,4,5,6,7,6,3],[5,6,7,8,9,6,4]])

u, v = np.unique(a[0], return_inverse=True)

output = np.zeros((a.shape[0], u.shape[0]))
output[0] = u.copy()
for i in xrange(u.shape[0]):
    pos = np.where(v==i)[0]
    output[1:,i] = np.sum(a[1:,pos], axis=1)

You should notice that u is going to be sorted from lowest to highest. If you want it from highest to lowest you have to do

output = output[:,::-1]

at the end.