Aaron Hall Aaron Hall - 1 month ago 6
Python Question

What is the relationship between the Python data model and built-in functions?

As I read Python answers on Stack Overflow, I continue to see some people telling users to use the data model's special methods or attributes directly.

I then see contradicting advice (sometimes from myself) saying not to do that, and instead to use builtin functions and the operators directly.

Why is that? What is the relationship between the special "dunder" methods and attributes of the Python data model and builtin functions?

When am I supposed to use the special names?


What is the relationship between the Python datamodel and builtin functions?

  • The builtins and operators use the underlying datamodel methods or attributes.
  • The builtins and operators have more elegant behavior and are in general more forward compatible.
  • The special methods of the datamodel are semantically non-public interfaces.

Thus, you should prefer to use the builtins and operators where possible over the special methods of the datamodel.

In depth

The builtin functions and operators invoke the special methods and use the special attributes in the Python datamodel. They are the readable and maintainable veneer that hides the internals of objects. In general, users should use the builtins and operators given in the language as opposed to calling the special methods or using the special attributes directly.

The builtin functions and operators also can have fallback or more elegant behavior than the more primitive datamodel special methods. For example:

  • next(obj, default) allows you to provide a default instead of raising StopIteration when an iterator runs out, while obj.__next__() does not.
  • str(obj) fallsback to obj.__repr__() when obj.__str__() isn't available - whereas calling obj.__str__() directly would raise an attribute error.
  • obj != other fallsback to not obj == other in Python 3 when no __ne__ - calling obj.__ne__(other) would not take advantage of this.

(Builtin functions can also be easily overshadowed, if necessary or desirable, on a module's global scope, to further customize behavior.)

Mapping the builtins and operators to the datamodel

Here is a mapping, with notes, of the builtin functions and operators to the respective special methods and attributes that they use or return - note that the usual rule is that the builtin function usually maps to a special method of the same name, but this is not consistent enough to warrant giving this map below:

builtins/     special methods/
operators  -> datamodel               NOTES (fb == fallsback)

repr(obj)     obj.__repr__()
str(obj)      obj.__str__()           fb to __repr__ if no __str__
bytes(obj)    obj.__bytes__()         Python 3 only
unicode(obj)  obj.__unicode__()       Python 2 only
format(obj)   obj.__format__()        format spec optional.
hash(obj)     obj.__hash__()
bool(obj)     obj.__bool__()          Python 3, fb to __len__
bool(obj)     obj.__nonzero__()       Python 2, fb to __len__
dir(obj)      obj.__dir__()
vars(obj)     obj.__dict__            does not include __slots__
type(obj)     obj.__class__
help(obj)     obj.__doc__             help uses more than just __doc__
len(obj)      obj.__len__()
iter(obj)     obj.__iter__()          fb to __getitem__ w/ indexes from 0 on
next(obj)     obj.__next__()          Python 3
next(obj)     obj.next()              Python 2
reversed(obj) obj.__reversed__()      fb to __len__ and __getitem__
other in obj  obj.__contains__(other) fb to __iter__ then __getitem__
obj == other  obj.__eq__(other)
obj != other  obj.__ne__(other)       fb to not obj.__eq__(other) in Python 3
obj < other   obj.__lt__(other)       get >, >=, <= with @functools.total_ordering
complex(obj)  obj.__complex__()
int(obj)      obj.__int__()
float(obj)    obj.__float__()
round(obj)    obj.__round__()
abs(obj)      obj.__abs__()

the subscript notation is contextual:

obj[name]         -> obj.__getitem__(name)
obj[name] = item  -> obj.__setitem__(name, item)
del obj[name]     -> obj.__delitem__(name)

There are also special methods for +, -, *, @, /, //, %, divmod(), pow(), **, <<, >>, &, ^, | operators, for example:

obj + other -> obj.__add__(other)
obj | other -> obj.__or__(other)

and in-place operators for augmented assignment, +=, -=, *=, @=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |=, for example:

obj += other -> obj.__iadd__(other)
obj |= other -> obj.__ior__(other)

and unary operations:

+obj -> obj.__pos__()
-obj -> obj.__neg__()
~obj -> obj.__invert__()

Similarly, classes can have special methods (from their metaclasses) that support abstract base classes:

isinstance(obj, cls) -> cls.__instancecheck__(obj)
issubclass(sub, cls) -> cls.__subclasscheck__(sub)

An important takeaway is that while the builtins like next and bool do not change between Python 2 and 3, underlying implementation names are changing.

Thus using the builtins also offers more forward compatibility.

When am I supposed to use the special names?

In Python, names that begin with underscores are semantically non-public names for users. The underscore is the creator's way of saying, "hands-off, don't touch."

This is not just cultural, but it is also in Python's treatment of API's. When a package's __init__.py uses import * to provide an API from a subpackage, if the subpackage does not provide an __all__, it excludes names that start with underscores. The subpackage's __name__ would also be excluded.

IDE autocompletion tools are mixed in their consideration of names that start with underscores to be non-public. However, I greatly appreciate not seeing __init__, __new__, __repr__, __str__, __eq__, etc. (nor any of the user created non-public interfaces) when I type the name of an object and a period.

Thus I assert:

The special "dunder" methods are not a part of the public interface. Avoid using them directly.

So when to use them?

The main use-case is when implementing your own custom object or subclass of a builtin object.

Try to only use them when absolutely necessary. Here are some examples:

Use the __name__ special attribute on functions or classes

When we decorate a function, we typically get a wrapper function in return that hides helpful information about the function. We would use the @wraps(fn) decorator to make sure we don't lose that information, but if we need the name of the function, we need to use the __name__ attribute directly:

from functools import wraps

def decorate(fn): 
    def decorated(*args, **kwargs):
        print('calling fn,', fn.__name__) # exception to the rule
        return fn(*args, **kwargs)
    return decorated

Similarly, I do the following when I need the name of the object's class in a method (used in, for example, a __repr__):

def get_class_name(self):
    return type(self).__name__
          # ^          # ^- must use __name__, no builtin e.g. name()
          # use type, not .__class__

Using special attributes to write custom objects or subclassed builtins

When we want to define custom behavior, we must use the data-model names.

This makes sense, since we are the implementors, these attributes aren't private to us.

class Foo(object):
    # required to here to implement == for instances:
    def __eq__(self, other):      
        # but we still use == for the values:
        return self.value == other.value
    # required to here to implement != for instances:
    def __ne__(self, other): # docs recommend for Python 2.
        # use the higher level of abstraction here:
        return not self == other  

However, even in this case, we don't use self.value.__eq__(other.value) or not self.__eq__(other) (see my answer here for proof that the latter can lead to unexpected behavior.) Instead, we should use the higher level of abstraction.

Another point at which we'd need to use the special method names is when we are in a child's implementation, and want to delegate to the parent. For example:

class NoisyFoo(Foo):
    def __eq__(self, other):
        print('checking for equality')
        # required here to call the parent's method
        return super(NoisyFoo, self).__eq__(other) 


Use the builtin functions and operators wherever you can. Only use the special methods where you must to accomplish your goals.