triphook triphook - 9 months ago 49
Python Question

Pandas lookup/pivot using column headings

I have a table containing watershed IDs and land cover classes:

WatershedID LandCover
2 Corn
8 Corn
2 Soy
8 Soy


and a separate lookup table which contains the area for each watershed/land cover combination:

WatershedID Corn Soy
2 14 1
3 2 14
5 18 8
7 21 2
8 6 31


What I would like to do is to append a column to the first table which contains the corresponding row/column value in the lookup table, like so:

WatershedID LandCover Area
2 Corn 14
8 Corn 6
2 Soy 1
8 Soy 31


I've managed to do this by iterating with a for loop:

areas = []
for watershed_id, land_cover in tableA.iterrows():
areas.append(tableB.loc[watershed_id][land_cover]


but given the size of my tables, this is slow. Is there a faster way to do this that doesn't involve iteration? I've been experimenting with MultiIndexing and pivot tables, but nothing has worked so far.

Answer Source

You can use unstack with merge:

df3 = df2.set_index('WatershedID').unstack().reset_index()
df3.columns = ['LandCover','WatershedID','Area']
print (df3)
  LandCover  WatershedID  Area
0      Corn            2    14
1      Corn            3     2
2      Corn            5    18
3      Corn            7    21
4      Corn            8     6
5       Soy            2     1
6       Soy            3    14
7       Soy            5     8
8       Soy            7     2
9       Soy            8    31

print (pd.merge(df1,df3))
   WatershedID LandCover  Area
0            2      Corn    14
1            8      Corn     6
2            2       Soy     1
3            8       Soy    31

If there are more same columns you need specify columns for join:

print (pd.merge(df1,df3, on=['WatershedID','LandCover']))
   WatershedID LandCover  Area
0            2      Corn    14
1            8      Corn     6
2            2       Soy     1
3            8       Soy    31