Null-Hypothesis Null-Hypothesis - 18 days ago 5
Python Question

Pandas: Key error in merge after creating index

I have large data frame to merge into make sure the the merge take place in multiprocessing manner I decided to use indexes. But after creating indexes I get key error.

For example:

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], 'B': ['B0', 'B1', 'B2', 'B3']})
(Pdb) df1
A B
0 A0 B0
1 A1 B1
2 A2 B2
3 A3 B3


But second DataFrame:

df2 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], 'C': ['C1', 'C2', 'C3', 'C4']})
(Pdb) df2
A C
0 A0 C1
1 A1 C2
2 A2 C3
3 A3 C4


Now I set indexes for both the data frames where column
A
is the index.

df1.set_index('A', inplace=True)
df2.set_index('A', inplace=True)

(Pdb) df1
B
A
A0 B0
A1 B1
A2 B2
A3 B3

(Pdb) df2
C
A
A0 C1
A1 C2
A2 C3
A3 C4


Now when I do the merge:

(Pdb) result = pd.merge(df1, df2, on='A')
*** KeyError: 'A'


But if I do this without creating index merge take place without a key error.

(Pdb) df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], 'B': ['B0', 'B1', 'B2', 'B3']})
(Pdb) df2 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], 'C': ['C1', 'C2', 'C3', 'C4']})
(Pdb) result = pd.merge(df1, df2, on='A')
(Pdb) result
A B C
0 A0 B0 C1
1 A1 B1 C2
2 A2 B2 C3
3 A3 B3 C4

Answer

if you merge on the index, you should both:

  • not specify the key in merge
  • use the left_index = True, right_index = True arguments to merge

otherwise, you have to explicitely tell what your key is using key =