Jan Jan - 2 months ago 13
Python Question

Pandas dataframe slicing

I have the following dataframe:

2012 2013 2014 2015 2016 2017 2018 Kategorie
0 5.31 5.27 5.61 4.34 4.54 5.02 7.07 Gewinn pro Aktie in EUR
1 13.39 14.70 12.45 16.29 15.67 14.17 10.08 KGV
2 -21.21 -0.75 6.45 -22.63 -7.75 9.76 47.52 Gewinnwachstum
3 -17.78 2.27 -0.55 3.39 1.48 0.34 NaN PEG


Now, I am selecting only the
KGV
row with:

df[df["Kategorie"] == "KGV"]


Which outputs:

2012 2013 2014 2015 2016 2017 2018 Kategorie
1 13.39 14.7 12.45 16.29 15.67 14.17 10.08 KGV


How do I calculate the
mean()
of the last five years (2016,15,14,13,12 in this example)?

I tried

df[df["Kategorie"] == "KGV"]["2016":"2012"].mean()


but this throws a
TypeError
. Why can I not slice the columns here?

Answer

loc supports that type of slicing (from left to right):

df.loc[df["Kategorie"] == "KGV", "2012":"2016"].mean(axis=1)
Out: 
1    14.5
dtype: float64

Note that this does not necessarily mean 2012, 2013, 2014, 2015 and 2016. These are strings so it means all columns between df['2012'] and df['2016']. There could be a column named foo in between and it would be selected.

Comments