I will happily take an edit on the title - I have no clue what to call this. Basically, I'm looking for an explanation to something I just came across in my work environment.
Consider the user with the user name
Test User randomlyGeneratedString
TEST PRODUCT (DO NOT BUY)
OR products.product_name LIKE %$search_terms%
OR users.display_name LIKE lower(%$search_terms%)
SQL lower() on one side of WHERE statement works for part of result but not all?
Yes, when one side is case sensitive/binary and there are values in the column with upper case characters in them.
LOWER() will match the substring in lower case only.
In MySQL there is no difference between the values
TeSt when you are using
LIKE and both of the operands are
This is why you get results for the
products.product_name LIKE %$search_terms% regardless of the character cases in the search term or in the stored values.
The string comparison becomes more strict when one of the operands are
case sensitive or
binary. To have a match all characters should be in the correct case.
users.display_name LIKE lower(%$search_terms%) This solves nothing, unless all the values are stored in lower case in the display_name column. (As the OP stated, the
display_name column has a binary collation.) As a side effect, if you have a
UNIQUE key on this column, the values
TeSt are treated as different values and you can store both of them.
To make the above condition work as expected, you have to convert both sides to the same case:
lower(users.display_name) LIKE lower(%$search_terms%), or you can convert the collation of the column using the
users.display_name COLLATE utf8_general_ci = %$search_terms%
Please note that this may render two different values equal, (
TeSt will become equal). Also note, that using any functions/transformations on a column will prevent MySQL to use indexes on that column. This includes the
Only one question left: Why do you store those values in a case sensitive (or binary) column and want to search using case insensitive comparison?
You can check the statements above using a batch like this:
SET @bin = 'tEsT' COLLATE utf8_bin, @ci = 'tEsT' COLLATE utf8_general_ci ; SELECT @bin, LOWER(@bin), @bin LIKE '%TeSt%', -- 0 LOWER(@bin) LIKE '%TeSt%', -- 0 LOWER(@bin) LIKE LOWER('%TeSt%') -- 1 ; SELECT @ci, LOWER(@ci), @ci LIKE '%TeSt%', -- 1 LOWER(@ci) LIKE '%TeSt%', -- 1 LOWER(@ci) LIKE LOWER('%TeSt%') -- 1 ;
You mentioned, that
randomlyGeneratedString, randomlyGeneratedStrinG returned the CORRECT RESULTS. The correct result is the users with
randomlygeneratedstring in their user names at any position. Unfortunatelly you did not provide sample data what we can use to explain how string comparison works in MySQL.
Here is the list of possible user names to be returned:
Test User ztr9ejsj6a Test User 2i0jltuwrl Test User csmr0e16ry Test User 57lhpp4bfi Test User 6pvz9w7m6z
Searching ztr9ejsj6a returns 1 result:
Test User ztr9ejsj6aSearching ztr9ejsj6A (note the last letter capitalized) returns 1 result:
Test User ztr9ejsj6a.
In both cases, the search string will be converted to it's lower case variant, which is 'ztr9ejsj6a', which is available in the database as the part of one display name.
Searching 'User', 'user', and 'USER' returns 0 results
This is because, you have no values with lower case
user in the display name.
Searching any variation of TEST is irrelevant (products have TEST in their name and therefore are searched before the user name).
This is not true, you will have no results from the user table, since you have no records with the lower case
test in them.
Applying the LOWER() function on the column will render all values lower case and allows the results you may want.
Your test data is a bit wrong tho. For the random string you have only lower case values and the 'Test' and 'User' parts are always in the mentioned form. In short, your test data does not contain all possible variations and with this, you may miss to find some bugs in your logic. Consider to use this (or something like this):
Test User ztr9ejsj6a -- no match for 'user' nor for 'test' Test user 2i0jltuwrl -- match for 'user', but not for 'test' test user csmr0e16ry -- match for both 'test' and 'user' test User 98sdvg09s8 -- match for 'test' but not for 'user' Test User 57lhPP4bFi -- no match for '57lhpp4bfi' Test User 6Pvtestm6Z -- match for 'test' due to the random text!
The right side of the table above stands for all mentioned values in all case combinations (since they will be converted to their lower case variants).
As you can see, when you transform your search string to lower case, you will include some of the records, and exclude others, since the comparison is done using a case sensitive method. To prevent this either apply the same transformations to both sides or force both sides into their case insensitive form.