Matt - 2 months ago 41

Python Question

Just curious.

I have some data I am working with, and when I input

`train.Id.shape`

python returned

`(1467,)`

but when I input

`train.shape[0]`

python returned

`1467`

Curious how Pandas handles these two different inputs, and why they are different.

Is this a specific feature, or just a quirk?

Answer

`train.Id`

is a pandas Series and is one dimensional. `train`

is a pandas DataFrame and is two dimensional. `shape`

is an attribute that both DataFrames and Series have. It is always a tuple. For a Series the tuple has only only value `(x,)`

. For a DataFrame shape is a tuple with two values `(x, y)`

. So `train.Id.shape[0]`

would also return `1467`

. However, `train.Id.shape[1]`

would produce an error while `train.shape[1]`

would give you the number of columns in `train`

.

Furthermore, pandas Panel objects are three dimensional and `shape`

for it returns a tuple `(x, y, z)`

```
train = pd.DataFrame(dict(Id=np.arange(1437), A=np.arange(1437)))
print(train.shape)
print(train.Id.shape)
(1437, 2)
(1437,)
```