Graipher Graipher - 2 months ago 7
Python Question

How efficient is list.index(value, start, end)?

Today I realized that python's

list.index
can also take an optional
start
(and even
end
) parameter.

I was wondering whether or not this is efficiently implemented and which of these two is better:

pattern = "qwertyuytresdftyuioknn"
words_list = ['queen', 'quoin']
for word in words_list:
i = 1
for character in word:
try:
i += pattern[i:].index(character)
except ValueError:
break
else:
yield word


or

pattern = "qwertyuytresdftyuioknn"
words_list = ['queen', 'quoin']
for word in words_list:
i = 1
for character in word:
try:
i = pattern.index(character, i)
except ValueError:
break
else:
yield word


So basically
i += pattern[i:].index(character)
vs
i = pattern.index(character, i)
.

Searching for this on generic_search_machine returns nothing helpful, except a lot of beginner tutorials trying to teach me what a list is.

Background:
This code tries to find all words from
words_list
which match
pattern
.
pattern
is a list of characters a user entered by swiping over the keyboard, like on most modern mobile device's keyboards.

In the actual implementation there is the additional requirement that the returned word should be longer than 5 characters and the first and last character have to exactly match. These lines are omitted here for brevity, since they are trivial to implement.

Answer

This calls a built-in function implemented in C:

i = pattern.index(character, i)

Even without looking at the source code, you can always assume that the underlying implementation is smart enough to implement that efficiently, i.e. that it does not look at the first i values in the list.

As a rule of thumb, using a built-in functionality is always faster than (or at least as fast as) the best thing you can implement yourself.

The attempt to make it better:

i += pattern[i:].index(character)

This is deffinitely worse. It makes a copy of pattern[i:] and then looks for character in it.

So, in the worst case, if you have a pattern of 1 GB and i=1, this copies 1 GB of data in memory in attempt to skip the first element (which whould have been skipped anyway).