Codje - 1 year ago 43

Python Question

I am currently struggling to find an efficient way to compare part of a string element attached to a list, to another string element. The current code computation is very long (1 hour with 4,8 millions elements in first list and 5000 elements in second one).

What I need to do: If 8 first characters of the first string element is equal to the full second element, a third list is updated with the full first element. Once it is found, we test another element of the first list.

Here is the code:

`for first_element in first_List :`

for second_element in second_List:

if first_element[:8] == second_element :

third_List.append(first_element)

break

I know those kinds of loops are not the best way to deal with very big lists. The number of

`if`

I was wondering if there is an efficient way to do this.

I think intersection with sets won't work since I'm comparing a part of an element to a full one and I need to copy the full first element in a third list.

Do you have some suggestions or ideas please?

Answer Source

This works:

```
second_set = set(second_list)
third_list = [value for value in first_list if value[:8] in second_set]
```

Example:

```
>>> first_list = ['abcdfghij', 'xyzxyzxyz', 'fjgjgggjhhh']
>>> second_list = ['abcdfghi', 'xyzxyzxy', 'xxx']
>>> second_set = set(second_list)
>>> third_list = [value for value in first_list if value[:8] in second_set]
>>> third_list
['abcdfghij', 'xyzxyzxyz']
```

This should be much more efficient.
The conversion of the list `second_list`

into the set is `O(n)`

.
There is one loop over `first_list`

that is `O(n)`

. The lookup in the `set`

, i.e. `in second_set`

is `O(1)`

.