Alan Alan - 4 days ago 6
MySQL Question

Django startswith vs endswith performance on MySQL

Lets say I have the following model

class Person(models.Model):
name = models.CharField(max_length=20, primary_key=True)


So I would have objects in the database like

Person.objects.create(name='alex white')
Person.objects.create(name='alex chen')
Person.objects.create(name='tony white')


I could then subsequently query for all users whose first name is
alex
or last name is
white
by doing the following

all_alex = Person.objects.filter(name__startswith='alex')
all_white = Person.objects.filter(name__endswith='white')


I do not know how Django implements this under the hood, but I am going to guess it is with a SQL
LIKE 'alex%'
or
LIKE '%white'


However, since according to MySQL index documentation, since the primary key index can only be used (e.g. as opposed to a full table scan) if
%
appears on the end of the
LIKE
query.

Does that mean that, as the database grows,
startswith
will be viable - whereas
endswith
will not be since it will resort to full table scans?


Am I correct or did I go wrong somewhere? Keep in mind these are not facts but just my deductions that I made from general assumptions - hence why I am asking for confirmation.

Answer

Yes, your understanding is correct.

select *
from foo
where bar like 'text1%' and bar like '%text2'

is not necessarily optimal. This could be an improvement:

select *
from (select *
      from foo
      where foo.bar like 'text1%') t
where t.bar like '%text2'

You need to make measurements to check whether this is better. If it is, the cause is thatin the inner query you use an index, while in the outer query you not use an index, but the set is prefiltered by the first query, therefore you have a much smaller set to query.

I am not at all a Django expert, so my answer might be wrong, but I believe chaining your filter would be helpful if filter actually executes the query. If that is the case, then you can use the optimization described above. If filter just prepares a query and chaining filters will result in a single query different from the one above, then I recommend using hand-written MySQL. However, if you do not have performance issues yet, then it is premature to optimize it, since you cannot really test the amount of performance you gained.

Comments