I am currently struggling to find an efficient way to compare part of a string element attached to a list, to another string element. The current code computation is very long (1 hour with 4,8 millions elements in first list and 5000 elements in second one).
What I need to do: If 8 first characters of the first string element is equal to the full second element, a third list is updated with the full first element. Once it is found, we test another element of the first list.
Here is the code:
for first_element in first_List :
for second_element in second_List:
if first_element[:8] == second_element :
second_set = set(second_list) third_list = [value for value in first_list if value[:8] in second_set]
>>> first_list = ['abcdfghij', 'xyzxyzxyz', 'fjgjgggjhhh'] >>> second_list = ['abcdfghi', 'xyzxyzxy', 'xxx'] >>> second_set = set(second_list) >>> third_list = [value for value in first_list if value[:8] in second_set] >>> third_list ['abcdfghij', 'xyzxyzxyz']
This should be much more efficient.
The conversion of the list
second_list into the set is
There is one loop over
first_list that is
O(n). The lookup in the
in second_set is