Marcelo Assis Marcelo Assis - 8 months ago 33
Python Question

Why Python returns True when checking if an empty string is in another?

My limited brain cannot understand why this happens:

>>> print '' in 'lolsome'

In PHP, a equivalent comparison returns false:

var_dump(strpos('', 'lolsome'));


From the documentation:

For the Unicode and string types, x in y is true if and only if x is a substring of y. An equivalent test is y.find(x) != -1. Note, x and y need not be the same type; consequently, u'ab' in 'abc' will return True. Empty strings are always considered to be a substring of any other string, so "" in "abc" will return True.

From looking at your print call, you're using 2.x.

To go deeper, look at the bytecode:

>>> def answer():
...   '' in 'lolsome'

>>> dis.dis(answer)
  2           0 LOAD_CONST               1 ('')
              3 LOAD_CONST               2 ('lolsome')
              6 COMPARE_OP               6 (in)
              9 POP_TOP
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE

COMPARE_OP is where we are doing our boolean operation and looking at the source code for in reveals where the comparison happens:

        w = POP();
        v = TOP();
        x = cmp_outcome(oparg, v, w);
        if (x == NULL) break;

and where cmp_outcome is in the same file, it's easy to find our next clue:

res = PySequence_Contains(w, v);

which is in abstract.c:

    Py_ssize_t result;
    if (PyType_HasFeature(seq->ob_type, Py_TPFLAGS_HAVE_SEQUENCE_IN)) {
        PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;
        if (sqm != NULL && sqm->sq_contains != NULL)
            return (*sqm->sq_contains)(seq, ob);
    result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
    return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);

and to come up for air from the source, we find this next function in the documentation:

objobjproc PySequenceMethods.sq_contains

This function may be used by PySequence_Contains() and has the same signature. This slot may be left to NULL, in this case PySequence_Contains() simply traverses the sequence until it finds a match.

and further down in the same documentation:

int PySequence_Contains(PyObject *o, PyObject *value)

Determine if o contains value. If an item in o is equal to value, return 1, otherwise return 0. On error, return -1. This is equivalent to the Python expression value in o.

Where '' isn't null, the sequence 'lolsome' can be thought to contain it.