wim wim - 1 year ago 74
Python Question

Attribute access on int literals

>>> 1 .__hash__()
>>> 1.__hash__()
File "<stdin>", line 1
SyntaxError: invalid syntax

It has been covered here before that the second example doesn't work because the int literal is actually parsed as a float.

My question is, why doesn't python parse this as attribute access on an int, when the interpretation as a float is a syntax error? The docs section on lexical analysis seem to suggest whitespace only required when other interpretations are ambiguous, but perhaps I'm reading this section wrong.

On a hunch it seems like the lexer is greedy (trying to take the biggest token possible), but I have no source for this claim.

Answer Source

Read carefully, it says

Whitespace is needed between two tokens only if their concatenation could otherwise be interpreted as a different token (e.g., ab is one token, but a b is two tokens).

1.__hash__() is tokenized as:

import io, tokenize
for token in tokenize.tokenize(io.BytesIO(b"1.__hash__()").read):

#>>> utf-8
#>>> 1.
#>>> __hash__
#>>> (
#>>> )

Python will chose the tokens which are largest; after parsing no two tokens should be able to be combined into a valid token. The logic is very similar to that in your other question.

The confusion seems to be not recognizing the tokenizing step as a completely distinct step. If the grammar allowed splitting up tokens solely to make the parser happy then surely you'd expect

_ or1.

to tokenize as


but there is no such rule, so it tokenizes as