entercaspa entercaspa - 1 month ago 9
Python Question

pandas new column equals another column with condition

got a pd database called data:

transaction_id house_id date_sale sale_price boolean_2015
0 1 1 31 Mar 2016 £880,000 True
3 4 2 31 Mar 2016 £450,000 True
4 5 3 31 Mar 2016 £680,000 True
6 7 4 31 Mar 2016 £1,850,000 True
7 8 5 31 Mar 2016 £420,000 True


and another one called houses:

id address postcode postcode first
0 1 Flat 78, Andrewes House, Barbican, London, Gre... EC2Y 8AY EC2Y
1 2 Flat 35, John Trundle Court, Barbican, London,... EC2Y 8DJ EC2Y


and question is how do I add a column to data called 'postcode_first' where I look up data['house_id'] and add the first part of the postcode to each row in data['postcode_first']?

the closest I got was

data['postcode'] = np.where(houses['id'] == data['house_id'])


but this doesnt make sense at all
any help guys?
EDIT
also tried
data['postcode'] = houses.loc[houses['id'] == data['house_id']]['postcode_first']


but this returned

Traceback (most recent call last):
File "/Users/saminahbab/Documents/House_Prices/final project/sql_functions.py", line 30, in <module>
data['postcode'] = houses.loc[houses['id'] == data['house_id']]['postcode_first']
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/ops.py", line 735, in wrapper
raise ValueError('Series lengths must match to compare')
ValueError: Series lengths must match to compare


which shouldnt matter because I am trying to essentially say
data['postcode'] equals houses['postcode_first'] WHERE houses['id'] equals data['house_id']

Answer

You can use map() method:

In [108]: df['postcode'] = df.house_id.map(h.set_index('id')['postcode first'])

In [109]: df
Out[109]:
   transaction_id  house_id    date_sale  sale_price boolean_2015 postcode
0               1         1  31 Mar 2016    £880,000         True     EC2Y
3               4         2  31 Mar 2016    £450,000         True     EC2Y
4               5         3  31 Mar 2016    £680,000         True      NaN
6               7         4  31 Mar 2016  £1,850,000         True      NaN
7               8         5  31 Mar 2016    £420,000         True      NaN

Data:

In [110]: h
Out[110]:
   id                                         address  postcode postcode first
0   1  Flat 78, Andrewes House, Barbican, London, Gre  EC2Y 8AY           EC2Y
1   2   Flat 35, John Trundle Court, Barbican, London  EC2Y 8DJ           EC2Y

In [113]: df
Out[113]:
   transaction_id  house_id    date_sale  sale_price boolean_2015
0               1         1  31 Mar 2016    £880,000         True
3               4         2  31 Mar 2016    £450,000         True
4               5         3  31 Mar 2016    £680,000         True
6               7         4  31 Mar 2016  £1,850,000         True
7               8         5  31 Mar 2016    £420,000         True