Yakym Pirozhenko - 1 year ago 48

Python Question

I was looking at the source of sorted_containers and was surprised to see this line:

`self._load, self._twice, self._half = load, load * 2, load >> 1`

Here

`load`

- (times, divide)
- (shift, shift)
- (times, shift)
- (shift, divide)

and found that #3 is consistently faster than other alternatives:

`# self._load, self._twice, self._half = load, load * 2, load >> 1`

import random

import timeit

import pandas as pd

x = random.randint(10 ** 3, 10 ** 6)

def test_naive():

a, b, c = x, 2 * x, x // 2

def test_shift():

a, b, c = x, x << 1, x >> 1

def test_mixed():

a, b, c = x, x * 2, x >> 1

def test_mixed_swaped():

a, b, c = x, x << 1, x // 2

def observe(k):

print(k)

return {

'naive': timeit.timeit(test_naive),

'shift': timeit.timeit(test_shift),

'mixed': timeit.timeit(test_mixed),

'mixed_swapped': timeit.timeit(test_mixed_swaped),

}

def get_observations():

return pd.DataFrame([observe(k) for k in range(100)])

The question:

Is my test valid? If so, why is (multiply, shift) faster than (shift, shift)?

I run Python 3.5 on Ubuntu 14.04

Answer

This seems to be because multiplication of small numbers is optimized in CPython 3.5, in a way that left shifts by small numbers are not. Positive left shifts always create a larger integer object to store the result, as part of the calculation, while for multiplications of the sort you used in your test, a special optimization avoids this and creates an integer object of the correct size. This can be seen in the source code of Python's integer implementation.

Because integers in Python are arbitrary-precision, they are stored as arrays of integer "digits", with a limit on the number of bits per integer. So in the general case, operations involving integers are not single operations, but instead need to handle the case of multiple "digits".

The beginning of the integer multiplication implementation is as follows:

```
static PyObject *
long_mul(PyLongObject *a, PyLongObject *b)
{
PyLongObject *z;
CHECK_BINOP(a, b);
/* fast path for single-digit multiplication */
if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
stwodigits v = (stwodigits)(MEDIUM_VALUE(a)) * MEDIUM_VALUE(b);
#ifdef HAVE_LONG_LONG
return PyLong_FromLongLong((PY_LONG_LONG)v);
#else
/* if we don't have long long then we're almost certainly
using 15-bit digits, so v will fit in a long. In the
unlikely event that we're using 30-bit digits on a platform
without long long, a large v will just cause us to fall
through to the general multiplication code below. */
if (v >= LONG_MIN && v <= LONG_MAX)
return PyLong_FromLong((long)v);
#endif
}
```

So when multiplying two integers where each individually fits in a small enough integer, and their product fits in a small enough integer, this is done as a direct multiplication by the CPython interpreter, instead of working with the integers as arrays.

In the case of a single-digit result, `PyLong_FromLongLong()`

will calculate that this is the case in a relatively small number of operations, and create a single-digit integer object to store it.

In contrast, left shifts are not optimized this way, and every left shift deals with the integer being shifted as an array. In particular, if you look at the source code for `long_lshift()`

, in the case of a small but positive left shift, a 2-"digit" integer object is always created, if only to be replaced later with a single-digit one: *(my comments in /*** ***/)*

```
static PyObject *
long_lshift(PyObject *v, PyObject *w)
{
/*** ... ***/
wordshift = shiftby / PyLong_SHIFT; /*** zero for small w ***/
remshift = shiftby - wordshift * PyLong_SHIFT; /*** w for small w ***/
oldsize = Py_ABS(Py_SIZE(a)); /*** 1 for small v ***/
newsize = oldsize + wordshift;
if (remshift)
++newsize; /*** here newsize becomes at least 2 for w > 0 ***/
z = _PyLong_New(newsize);
/*** ... ***/
}
```