triphook triphook - 1 month ago 5
Python Question

Pandas lookup/pivot using column headings

I have a table containing watershed IDs and land cover classes:

WatershedID LandCover
2 Corn
8 Corn
2 Soy
8 Soy


and a separate lookup table which contains the area for each watershed/land cover combination:

WatershedID Corn Soy
2 14 1
3 2 14
5 18 8
7 21 2
8 6 31


What I would like to do is to append a column to the first table which contains the corresponding row/column value in the lookup table, like so:

WatershedID LandCover Area
2 Corn 14
8 Corn 6
2 Soy 1
8 Soy 31


I've managed to do this by iterating with a for loop:

areas = []
for watershed_id, land_cover in tableA.iterrows():
areas.append(tableB.loc[watershed_id][land_cover]


but given the size of my tables, this is slow. Is there a faster way to do this that doesn't involve iteration? I've been experimenting with MultiIndexing and pivot tables, but nothing has worked so far.

Answer

You can use unstack with merge:

df3 = df2.set_index('WatershedID').unstack().reset_index()
df3.columns = ['LandCover','WatershedID','Area']
print (df3)
  LandCover  WatershedID  Area
0      Corn            2    14
1      Corn            3     2
2      Corn            5    18
3      Corn            7    21
4      Corn            8     6
5       Soy            2     1
6       Soy            3    14
7       Soy            5     8
8       Soy            7     2
9       Soy            8    31

print (pd.merge(df1,df3))
   WatershedID LandCover  Area
0            2      Corn    14
1            8      Corn     6
2            2       Soy     1
3            8       Soy    31

If there are more same columns you need specify columns for join:

print (pd.merge(df1,df3, on=['WatershedID','LandCover']))
   WatershedID LandCover  Area
0            2      Corn    14
1            8      Corn     6
2            2       Soy     1
3            8       Soy    31
Comments