Dorian Dore Dorian Dore - 1 year ago 35
Python Question

Why do numbers in a string become "x0n" when a backslash precedes them?

I was doing a few experiments with escape backslashes in the Python 3.4 shell and noticed something quite strange.

>>> string = "\test\test\1\2\3"
>>> string
>>> string = "5"
>>> string
>>> string = "5\6\7"
>>> string

As you can see in the above code, I defined a variable string as
. However, when I entered
in the console, instead of printing
, it printed
. Why does this occur, and what is it used for?


In Python string literals, the \ character starts escape sequences. \n translates to a newline character, \t to a tab, etc. \xhh hex sequences let you produce codepoints with hex values instead, \uhhhh produce codepoints with 4-digit hex values, and \Uhhhhhhhh produce codepoints with 8-digit hex values.

See the String and Bytes Literals documentation, which contains a table of all the possible escape sequences.

When Python echoes a string object in the interpreter (or you use the repr() function on a string object), then Python creates a representation of the string value. That representation happens to use the exact same Python string literal syntax, to make it easier to debug your values, as you can use the representation to recreate the exact same value.

To keep non-printable characters from either causing havoc or not be shown at all, Python uses the same escape sequence syntax to represent those characters. Thus bytes that are not printable are represented using suitable \xhh sequences, or if possible, one of the \c single letter escapes (so newlines are shown as \n).

In your example, you created non-printable bytes using the \ooo octal value escape sequence syntax. The digits are interpreted as an octal number to create a corrensponding codepoint. When echoing that string value back, the default \xhh syntax is used to represent the exact same value in hexadecimal:

>>> '\20' # Octal for 16

while your \t became a tab character:

>>> print('\test')

Note how there is no letter t there; instead, the remaining est is indented by whitespace, a horizontal tab.

If you need to include literal \ backslash characters you need to double the character:

>>> '\\test\\1\\2\\3'
>>> print('\\test\\1\\2\\3')
>>> len('\\test\\1\\2\\3')

Note that the representation used doubled backslashes! If it didn't, you'd not be able to copy the string and paste it back into Python to recreate the value. Using print() to write the value to the terminal as actual characters (and not as a string representation) shows that there are single backslashes there, and taking the length shows we have just 11 characters in the string, not 15.

You can also use a raw string literal. That's just a different syntax, the string objects that are created from the syntax are the exact same type, with the same value. It is just a different way of spelling out string values. In a raw string literal, backslashes are just backslashes, as long as they are not the last character in the string; most escape sequences do not work in a raw string literal:

>>> r'\test\1\2\3'

Last but not least, if you are creating strings that represent filenames on your Windows system, you could also use forward slashes; most APIs in Window don't mind and accept both types of slash as separators in the filename:

>>> 'C:/This/is/a/valid/path'