Varlor Varlor - 5 months ago 15
Python Question

Python: Best way to store the top ten numbers

I have the following problem: I do paramter tests and create for every single paramter combination a new object, which is replaced by the next object created with other paramters. The Object has an attribute jaccard coefficient and an attribute ID. In every step i want to store the jaccard coeeficient of the object. At the end i want the top ten jaccard coeefcient and their corresponding ID.

r=["%.2f" % r for r in np.arange(3,5,1)]
fs=["%.2f" % fs for fs in np.arange(2,5,1)]
co=["%.2f" % co for co in np.arange(1,5,1)]
frc_networks=[]

bestJC = []
bestPercent = []
best10Candidates = []
count = 0
for parameters in itertools.product(r,fs,co):

args = parser.parse_args(["path1.csv","path2.csv","--r",parameters[0],"--fs",parameters[1],"--co",parameters[2]])

if not os.path.isfile('FCR_Network_Coordinates_ID_{}_r_{}_x_{}_y_{}_z_{}_fcr_{}_co_{}_1.csv'.format(count, args.r, args.x, args.y, args.z, args.fs,args.co)):

FRC_Network(count,args.p[0],args.p[1],args.x,args.y,args.z,args.r,args.fs,args.co)


The attributes can be called by FRC_Network.ID and FRC_Network.JC

Answer Source

I think I'd use heapq.heappushpop() for this. That way, no matter how large your input set is, your data requirement is limited to a list of 10 tuples.

Note the use of tuples to keep the JC and ID parameters. Since the comparisons are lexicographic, this will always sort by JC.

Also, note that the final call to .sort() is optional. If you just want the ten best, skip the call. If you want the ten best in order, keep the call.

import heapq

#UNTESTED
best = []
for parameters in itertools.product(r,fs,co):
    # ...
    if len(best) < 10:
        heapq.heappush(best, (FRC_Network.JC, FRC_Network.ID))
    else:
        heapq.heappushpop(best, (FRC_Network.JC, FRC_Network.ID))
best.sort(reverse=True)

Here is a tested version that demonstrates the concept:

import heapq
import random
from pprint import pprint

best = []
for ID in 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ':
    JC = random.randint(0, 100)
    if len(best) < 10:
        heapq.heappush(best, (JC, ID))
    else:
        heapq.heappushpop(best, (JC, ID))
pprint(best)

Result:

[(81, 'E'),
 (82, 'd'),
 (83, 'G'),
 (92, 'i'),
 (95, 'Z'),
 (100, 'p'),
 (89, 'q'),
 (98, 'a'),
 (96, 'z'),
 (97, 'O')]