Marcus Renno Marcus Renno - 2 months ago 27
Python Question

Pandas - Self reference of instances in column

I have the following DF

SampleID ParentID
0 S10 S20
1 S10 S30
2 S20 S40
3 S30
4 S40


How can I put the id of the other row in the column 'ParentID' instead of the string?

Expected result:

SampleID ParentID
0 S10 2
1 S10 3
2 S20 4
3 S30
4 S40


The closest result I found for this problem was:
How to self-reference column in pandas Data Frame?

Answer

I think you can use merge and then assign column index:

df1 = pd.merge(df[['SampleID']].reset_index(), 
               df[['ParentID']], 
               left_on='SampleID',
               right_on='ParentID')
print (df1)
   index SampleID ParentID
0      2      S20      S20
1      3      S30      S30
2      4      S40      S40

df['ParentID'] = df1['index']
df.fillna('', inplace=True)
print (df)
  SampleID ParentID
0      S10        2
1      S10        3
2      S20        4
3      S30         
4      S40      

Another solution with map and dict where swap keys with values:

d = dict((v,k) for k,v in df.SampleID.iteritems())
print (d)
{'S10': 1, 'S40': 4, 'S20': 2, 'S30': 3}

df.ParentID = df.ParentID.map(d)
df.ParentID.fillna('', inplace=True)
print (df)
  SampleID ParentID
0      S10        2
1      S10        3
2      S20        4
3      S30         
4      S40