Jacques Gaudin Jacques Gaudin - 2 months ago 8
Python Question

Projecting python list onto existing values (with generator of generators)

Given a constant, a sorted list of floats and a list of 2-element list of floats,

CONST = 1.
lst1 = [1.2, 2.4, 3.1] #sorted
lst2 = [[2.0, 0.9], [3.1, 1.5], [1.0, 3.0], [2.5, 2.0]]


I construct a new list with the pairs
[CONST, a]
for all the elements of
lst1


I need to "project"
list2
on the list of
[CONST,a]
: the second value of the pairs in
lst2
will be changed to the closest value in list1 and the first value of the pair will be the sum of all the first values for a same second value.

So the result with the example given would be:

[[6.1, 1.2], [3.5, 2.4], [2.0, 3.1]]


So far I have something like:

from itertools import groupby
from bisect import bisect
from operator import itemgetter

for t in lst2:
i = bisect(lst1, t[1])
bounds = lst1[i-1:i+1] if i else [lst1[0]]
t[1] = min(bounds, key=lambda x: abs(x-t[1]))

lst2 += [[CONST, a] for a in lst1]
lst2 = sorted(lst2, key=itemgetter(1))
res = [[sum([t[0] for t in group]), keys] for keys, group in groupby(lst2, itemgetter(1))]


but the lists (esp.
lst1
) can be long (1e5+) and I have a feeling that I could have better efficiency there. Any ideas ?

Answer

I think I found an intersting way of doing it: a generator of generators ! Python is great ! This is about twice as fast as the previous code.

CONST = 1.
lst1 = [1.2, 2.4, 3.1] #sorted
lst2 = [[2.0, 0.9], [3.1, 1.5], [1.0, 3.0], [2.5, 2.0]]

def gen_generator(lst1):
    #Create iterators to form consecutive pairs
    it1_1, it1_2 = iter(lst1), islice(lst1+[float('inf')], 1, None)
    #Create iterator over items to insert and initialize p
    gen_generator.it2 = iter(sorted(lst2, key=itemgetter(1)))
    gen_generator.p=next(gen_generator.it2)
    for t1, t2 in zip(it1_1, it1_2):
        #Calculate the mid point of the pair t1, t2
        sup = (t1+t2)/2 if t2 != float('inf') else float('inf')
        #Create a generator for each item in lst1
        def generator():
            #All the previously inserted items have disappeared
            #So yield any p such that p[1] < sup
            while gen_generator.p is not None and gen_generator.p[1] < sup:
                yield gen_generator.p[0]
                #Move to next or None
                gen_generator.p = next(gen_generator.it2, None)
            #Add the constant
            yield CONST
        yield generator

res = [[sum(gen()), t] for gen, t in zip(gen_generator(lst1), lst1)]
return res