 ajsp -5 years ago 185
Python Question

# Find rank and percentage rank in list

I have some very large lists that I am working with (>1M rows), and I am trying to find a fast (the fastest?) way of, given a float, ranking that float compared to the list of floats, and finding it's percentage rank compared to the range of the list. Here is my attempt, but it's extremely slow:

``````X =[0.595068426145485,
0.613726840488019,
1.1532608695652,
1.92952380952385,
4.44137931034496,
3.46432160804035,
2.20331487122673,
2.54736842105265,
3.57702702702689,
1.93202764976956,
1.34720184204056,
0.824997304105564,
0.765782842381996,
0.615110856990126,
0.622708022872803,
1.03211045820975,
0.997225012974318,
0.496352327702226,
0.67103858866700,
0.452224068868272,
0.441842124852685,
0.447584524952608,
0.4645525042246]

val = 1.5
arr = np.array(X) #X is actually a pandas column, hence the conversion
arr = np.insert(arr,1,val, axis=None) #insert the val into arr, to then be ranked
st  = np.sort(arr)

RANK      = float([i for i,k in enumerate(st) if k == val])+1 #Find position
PCNT_RANK = (1-(1-round(RANK/len(st),6)))*100 #Find percentage of value compared to range

print RANK, PCNT_RANK
>>> 17.0 70.8333
``````

For the percentage ranks I could probably build a distribution and sample from it, not quite sure yet, any suggestions welcome...it's going to be used heavily so any speed-up will be advantageous.

Thanks. leekaiinthesky
• `st = np.sort(arr)`. Sorting the list is on average O(n log n) time, where n is the size of the list.
• `RANK = float([i for i,k in enumerate(st) if k == val])+1`. Iterating through the list has O(n) time.
If you do need to sort the list (or have access to it pre-sorted), then the fastest option for the second step is `RANK = np.searchsorted(st, val) + 1`. Since the list is already sorted, finding the index will only take O(log n) time by binary search instead of having to iterate through the whole list. This will still be a lot faster than your original code.