Vaibhav Agarwal -4 years ago 146
Python Question

Ranking 2D Numpy array

I have a numpy array with 1000 rows and 2 columns as:

``````[[ 0.76        1.28947368]
[ 0.7         0.97142857]
[ 0.7         1.48571429]
[ 0.68        1.11764706]
[ 0.68        1.23529412]
[ 0.68        1.41176471]
[ 0.68        1.41176471]
[ 0.68        1.44117647]
[ 0.66        0.78787879]
[ 0.66        1.03030303]
[ 0.66        1.09090909]
[ 0.66        1.15151515]
[ 0.66        1.15151515]
[ 0.66        1.21212121]
[ 0.66        1.24242424]]
``````

As evident, this array is sorted in descending order by column 0 and in ascending order by column 1. I want to assign rank to each row of this array such that duplicate rows (values in both column of two or more rows are equal) have same rank and insert rank as column 2.

Expected Output:

``````     [[0.76        1.28947368  1]
[ 0.7         0.97142857  2]
[ 0.7         1.48571429  3]
[ 0.68        1.11764706  4]
[ 0.68        1.23529412  5]
[ 0.68        1.41176471  6]
[ 0.68        1.41176471  6]  # as this row is duplicate of row above it
[ 0.68        1.44117647  7]
[ 0.66        0.78787879  8]
[ 0.66        1.03030303  9]
[ 0.66        1.09090909  10]
[ 0.66        1.15151515  11]
[ 0.66        1.15151515  11] # as this row is duplicate of row above it
[ 0.66        1.21212121  12]
[ 0.66        1.24242424  13]]
``````

What is the most efficient way to achieve this??

Answer Source

For sorted array, like in the given sample, its easy -

``````rank = np.r_[True, (a[1:] != a[:-1]).any(1)].cumsum()
out = np.column_stack(( a, rank ))
``````

As alternative to `(a[1:] != a[:-1]).any(1)`, we could use the following for performance :

``````(a[1:,0] != a[:-1,0]) | (a[1:,1] != a[:-1,1])
``````

Sample step-by-step run

1) Input array :

``````In [70]: a
Out[70]:
array([[ 0.76      ,  1.28947368],
[ 0.68      ,  1.41176471],
[ 0.68      ,  1.41176471],
[ 0.68      ,  1.44117647],
[ 0.66      ,  1.09090909],
[ 0.66      ,  1.15151515],
[ 0.66      ,  1.15151515],
[ 0.66      ,  1.24242424]])
``````

2) Get a mask of inequality between consecutive rows. The idea here is that since the array is sorted, so the duplicate rows would have identical elements across both columns. So, with the inequality across both columns, we would have a 1D mask, but one element less than the total number of rows in original array, as we used slicing with one element left off :

``````In [71]: a[1:] != a[:-1]
Out[71]:
array([[ True,  True],
[False, False],
[False,  True],
[ True,  True],
[False,  True],
[False, False],
[False,  True]], dtype=bool)

In [72]: (a[1:] != a[:-1]).any(1)
Out[72]: array([ True, False,  True,  True,  True, False,  True], dtype=bool)
``````

Now, to compensate for the one-less element and since we need to start the ranking from `1` and we intend to use cumumlative summation for this incremental ranking, let's append a `1` at the start and then use `cumsum` to give us the expected ranks :

``````In [75]: np.r_[True, (a[1:] != a[:-1]).any(1)]
Out[75]: array([ True,  True, False,  True,  True,  True, False,  True], dtype=bool)

In [76]: np.r_[True, (a[1:] != a[:-1]).any(1)].cumsum()
Out[76]: array([1, 2, 2, 3, 4, 5, 5, 6])
``````

To visually verify, here's the stacked output :

``````In [77]: np.column_stack(( a, _ ))
Out[77]:
array([[ 0.76      ,  1.28947368,  1.        ],
[ 0.68      ,  1.41176471,  2.        ],
[ 0.68      ,  1.41176471,  2.        ],
[ 0.68      ,  1.44117647,  3.        ],
[ 0.66      ,  1.09090909,  4.        ],
[ 0.66      ,  1.15151515,  5.        ],
[ 0.66      ,  1.15151515,  5.        ],
[ 0.66      ,  1.24242424,  6.        ]])
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download
Latest added