Keozon Keozon - 1 month ago 8
Python Question

Reason for allowing Special Characters in Python Attributes

I somewhat accidentally discovered that you can set 'illegal' attributes to an object using

setattr
. By illegal, I mean attributes with names that can't be retrieve using the
__getattr__
interface with traditional
.
operator references. They can only be retrieved via the
getattr
method.

This, to me, seems rather astonishing, and I'm wondering if there's a reason for this, or if it's just something overlooked, etc. Since there exists an operator for retrieving attributes, and a standard implementation of the
setattribute
interface, I would expect it to only allow attribute names that can actually be retrieved normally. And, if you had some bizarre reason to want attributes that have invalid names, you would have to implement your own interface for them.

Am I alone in being surprised by this behavior?

class Foo:
"stores attrs"

foo = Foo()
setattr(foo, "bar.baz", "this can't be reached")
dir(foo)


This returns something that is both odd, and a little misleading:
[...'__weakref__', 'bar.baz']


And if I want to access foo.bar.baz in the 'standard' way, I cannot. The inability to retrieve it makes perfect sense, but the ability to set it is surprising.

foo.bar.baz
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Foo' object has no attribute 'bar'


Is it simply assumed that, if you have to use
setattr
to set the variable, you are going to reference it via
getattr
? Because at runtime, this may not always be true, especially with Python's interactive interpreter, reflection, etc. It still seems very odd that this would be permitted by default.

EDIT: An (very rough) example of what I would expect to see as the default implementation of setattr:

import re

class Safe:
"stores attrs"

def __setattr__(self, attr, value):
if not re.match(r"^\w[\w\d\-]+$", attr):
raise AttributeError("Invalid characters in attribute name")
else:
super().__setattr__(attr, value)


This will not permit me to use invalid characters in my attribute names. Obviously,
super()
could not be used on the base Object class, but this is just an example.

Answer

I think that your assumption that attributes must be "identifiers" is incorrect. As you've noted, python objects support arbitrary attributes (not just identifiers) because for most objects, the attributes are stored in the instance's __dict__ (which is a dict and therefore supports arbitrary string keys). However, in order to have an attribute access operator at all, the set of names that can be accessed in that way needs to be restricted to allow for the generation of a syntax that can parse it.

Is it simply assumed that, if you have to use setattr to set the variable, you are going to reference it via getattr?

No. I don't think that's assumed. I think that the assumption is that if you're referencing attributes using the . operator, then you know what those attributes are. And if you have the ability to know what those attributes are, then you probably have control over what they're called. And if you have control over what they're called, then you can name them something that the parser knows how to handle ;-).

Comments