Matt Matt - 3 days ago 7
Python Question

Why dataframe.shape[0] prints an integer, but dataframe.columnname.shape prints a tuple

Just curious.

I have some data I am working with, and when I input

train.Id.shape


python returned
(1467,)
- a tuple

but when I input

train.shape[0]


python returned
1467
- an integer

Curious how Pandas handles these two different inputs, and why they are different.
Is this a specific feature, or just a quirk?

Answer

train.Id is a pandas Series and is one dimensional. train is a pandas DataFrame and is two dimensional. shape is an attribute that both DataFrames and Series have. It is always a tuple. For a Series the tuple has only only value (x,). For a DataFrame shape is a tuple with two values (x, y). So train.Id.shape[0] would also return 1467. However, train.Id.shape[1] would produce an error while train.shape[1] would give you the number of columns in train.

Furthermore, pandas Panel objects are three dimensional and shape for it returns a tuple (x, y, z)

train = pd.DataFrame(dict(Id=np.arange(1437), A=np.arange(1437)))

print(train.shape)
print(train.Id.shape)

(1437, 2)
(1437,)
Comments