entercaspa entercaspa - 1 month ago 7
Python Question

Pandas means in column for subset in another column

I have a dataframe called houses:

transaction_id house_id date_sale sale_price boolean_2015 \
0 1 1 31 Mar 2016 £880,000 True
3 4 2 31 Mar 2016 £450,000 True
4 5 3 31 Mar 2016 £680,000 True
6 7 4 31 Mar 2016 £1,850,000 True
postcode
0 EC2Y
3 EC2Y
4 EC1Y
6 EC2Y


and I was wondering how to compute averages of sale_price based on each postcode
so the output is

Average
0 EC1Y £123220
1 EC2Y £434930


I did this with
averages = data.groupby(['postcode'], as_index=False).mean()

but this did not return sale_price
any thoughts?

Answer

You can first replace £, to empty string and then convert to_numeric column sale_price. Last cast to string by astype if need add £ to column sale_price:

data.sale_price = pd.to_numeric(data.sale_price.str.replace('[£,]',''))
averages = data.groupby(['postcode'], as_index=False)['sale_price'].mean()
averages.sale_price = '£' + averages.sale_price.astype(str) 
print (averages)
  postcode sale_price
0     EC1Y    £680000
1     EC2Y   £1060000
Comments