Gaut Gaut - 19 days ago 10
Python Question

Python string starting with a dot fails ID test?

I tried this in Python 2.7:

In [1]: s = 'abc'

In [2]: s is 'abc'
Out[2]: True


In [3]: s = '.abc'

In [4]: s is '.abc'
Out[4]: False


Why does the second test return False?

Answer

The answer is: because python tries to detect which strings look like identifiers, and interns them automatically, in order to make string comparison O(1) on them.

In the python interpreter, there is the following function:

#define NAME_CHARS \
    "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz"

/* all_name_chars(s): true iff all chars in s are valid NAME_CHARS */

static int
all_name_chars(unsigned char *s)
{
    static char ok_name_char[256];
    static unsigned char *name_chars = (unsigned char *)NAME_CHARS;

    if (ok_name_char[*name_chars] == 0) {
        unsigned char *p;
        for (p = name_chars; *p; p++)
            ok_name_char[*p] = 1;
    }
    while (*s) {
        if (ok_name_char[*s++] == 0)
            return 0;
    }
    return 1;
}

It's called on all the string literals in your code, to detect if they look like identifiers and should be interned.

It often happens that programs use strings as identifiers, for instance as keys in a dictionary, or as a flag of some sort. It's important that comparison of such strings can be done very fast, by just checking the identity of objects. Thus, python detects all such strings in your code, and make them point to unique objects. That's why your first comparison returns true.

However interning strings takes memory, so python tries not to intern strings that look like natural language or text. So if a string contains any character that is not a letter, a number, or a _, then it's not interned.

You can find more information about this here: http://guilload.com/python-string-interning/

Comments