Codje - 7 months ago 8
Python Question

# Efficient way to loop on comparing string list element to a string list sub-element

I am currently struggling to find an efficient way to compare part of a string element attached to a list, to another string element. The current code computation is very long (1 hour with 4,8 millions elements in first list and 5000 elements in second one).

What I need to do: If 8 first characters of the first string element is equal to the full second element, a third list is updated with the full first element. Once it is found, we test another element of the first list.

Here is the code:

``````for first_element in first_List :
for second_element in second_List:
if first_element[:8] == second_element :
third_List.append(first_element)
break
``````

I know those kinds of loops are not the best way to deal with very big lists. The number of
`if`
tests is really huge.
I was wondering if there is an efficient way to do this.

I think intersection with sets won't work since I'm comparing a part of an element to a full one and I need to copy the full first element in a third list.

Do you have some suggestions or ideas please?

This works:

``````second_set = set(second_list)
third_list = [value for value in first_list if value[:8] in second_set]
``````

Example:

``````>>> first_list = ['abcdfghij', 'xyzxyzxyz', 'fjgjgggjhhh']
>>> second_list = ['abcdfghi', 'xyzxyzxy', 'xxx']
>>> second_set = set(second_list)
>>> third_list = [value for value in first_list if value[:8] in second_set]
>>> third_list
['abcdfghij', 'xyzxyzxyz']
``````

This should be much more efficient. The conversion of the list `second_list` into the set is `O(n)`. There is one loop over `first_list` that is `O(n)`. The lookup in the `set`, i.e. `in second_set` is `O(1)`.