user248237dfsf user248237dfsf - 1 year ago 111
Python Question

efficiently checking that string consists of one character in Python

What is an efficient way to check that a string

in Python consists of just one character, say
? Something like
all_equal(s, 'A')
which would behave like this:

all_equal("AAAAA", "A") = True

all_equal("AAAAAAAAAAA", "A") = True

all_equal("AAAAAfAAAAA", "A") = False

Two seemingly inefficient ways would be to: first convert the string to a list and check each element, or second to use a regular expression. Are there more efficient ways or are these the best one can do in Python? Thanks.

Answer Source

This is by far the fastest, several times faster than even count(), just time it with that excellent mgilson's timing suite:

s == len(s) * s[0]

Here all the checking is done inside the Python C code which just:

  • allocates len(s) characters;
  • fills the space with the first character;
  • compares two strings.

The longer the string is, the greater is time bonus. However, as mgilson writes, it creates a copy of the string, so if your string length is many millions of symbols, it may become a problem.

As we can see from timing results, generally the fastest ways to solve the task do not execute any Python code for each symbol. However, the set() solution also does all the job inside C code of the Python library, but it is still slow, probably because of operating string through Python object interface.

UPD: Concerning the empty string case. What to do with it strongly depends on the task. If the task is "check if all the symbols in a string are the same", s == len(s) * s[0] is a valid answer (no symbols mean an error, and exception is ok). If the task is "check if there is exactly one unique symbol", empty string should give us False, and the answer is s and s == len(s) * s[0], or bool(s) and s == len(s) * s[0] if you prefer receiving boolean values. Finally, if we understand the task as "check if there are no different symbols", the result for empty string is True, and the answer is not s or s == len(s) * s[0].