Koolaid Koolaid - 4 months ago 15
Python Question

Efficiently comparing the first item in each list of two large list of lists?

I'm currently working with a a large list of lists (~280k lists) and a smaller one (~3.5k lists). I'm trying to efficiently compare the first index in the smaller list to the first index in the large list. If they match, I want to return both lists from the small and large list that have a matching first index.

For example:

Large List 1:

[[a,b,c,d],[e,f,g,h],[i,j,k,l],[m,n,o,p]]


Smaller list 2:

[[e,q,r,s],[a,t,w,s]]


Would return

[([e,q,r,s],[e,f,g,h]),([a,t,w,s],[a,b,c,d])]


I currently have it setup as shown here, where a list of tuples is returned with each tuple holding the two lists that have a matching first element. I'm fine with any other data structures being used. I was trying to use a set of tuples but was having issues trying to figure out how to do it quicker than what I already have.

My code to to compare these two list of lists is currently this:

match = []
for list_one in small_list:
for list_two in large_list:
if str(list_one[0]).lower() in str(list_two[0]).lower():
match.append((spm_values, cucm_values))
break
return match

Answer

Assuming order doesn't matter, I would highly recommend using a dictionary to map prefix (one character) to items and set to find matches:

# generation of data... not important
>>> lst1 = [list(c) for c in ["abcd", "efgh", "ijkl", "mnop"]]
>>> lst2 = [list(c) for c in ["eqrs", "atws"]]

# mapping prefix to list (assuming uniqueness)
>>> by_prefix1 = {chars[0]: chars for chars in lst1}
>>> by_prefix2 = {chars[0]: chars for chars in lst2}

# actually finding matches by intersecting sets (fast)
>>> common = set(by_prefix1.keys()) & set(by_prefix2.keys())
>>> tuples = tuple(((by_prefix1[k], by_prefix2[k]) for k in common))
>>> tuples
Comments