Koolaid Koolaid - 1 year ago 51
Python Question

Efficiently comparing the first item in each list of two large list of lists?

I'm currently working with a a large list of lists (~280k lists) and a smaller one (~3.5k lists). I'm trying to efficiently compare the first index in the smaller list to the first index in the large list. If they match, I want to return both lists from the small and large list that have a matching first index.

For example:

Large List 1:


Smaller list 2:


Would return


I currently have it setup as shown here, where a list of tuples is returned with each tuple holding the two lists that have a matching first element. I'm fine with any other data structures being used. I was trying to use a set of tuples but was having issues trying to figure out how to do it quicker than what I already have.

My code to to compare these two list of lists is currently this:

match = []
for list_one in small_list:
for list_two in large_list:
if str(list_one[0]).lower() in str(list_two[0]).lower():
match.append((spm_values, cucm_values))
return match

Answer Source

Assuming order doesn't matter, I would highly recommend using a dictionary to map prefix (one character) to items and set to find matches:

# generation of data... not important
>>> lst1 = [list(c) for c in ["abcd", "efgh", "ijkl", "mnop"]]
>>> lst2 = [list(c) for c in ["eqrs", "atws"]]

# mapping prefix to list (assuming uniqueness)
>>> by_prefix1 = {chars[0]: chars for chars in lst1}
>>> by_prefix2 = {chars[0]: chars for chars in lst2}

# actually finding matches by intersecting sets (fast)
>>> common = set(by_prefix1.keys()) & set(by_prefix2.keys())
>>> tuples = tuple(((by_prefix1[k], by_prefix2[k]) for k in common))
>>> tuples