Federico Gentile Federico Gentile - 16 days ago 10
Python Question

Detect indexes of a dataframe from values contained in another dataframe with Pandas

I have 2 dataframes:

# dataframe 1
data = {'Name':['PINO','PALO','TNCO' ,'TNTO','CUCO' ,'FIGO','ONGF','LABO'],
'Id' :[ 10 , 9 ,np.nan , 14 , 3 ,np.nan, 7 ,np.nan]}
df1 = pd.DataFrame(data)


and

# dataframe 2
convert_table = {'Name': ['ALLO','BELO','CACO','CUCO','DADO','FIGO','FIGO','ONGF','PALO','PALO','PINO','TNCO','TNCO','TNCO','TNTO']}
df2 = pd.DataFrame(convert_table)


My goal is to identify the indexes of the elements of
df2['Name']
which follow these conditions:


  • Are present in
    df1['Name']

  • Have the correspondin
    df1['Id'] = NaN



I was able to achieve my goal by using the following lines of code:

nan_names = df1['Name'][df1['Id'].isnull()]

df3 = pd.DataFrame()
for name in nan_names:
index = df2[df2['Name']==name].index.tolist()
if index:
dic = {'name':[name] , 'index':[index]}
df3 = pd.concat([df3,pd.DataFrame(dic)], ignore_index=True)


However I would like to know if there is a more efficient and elegant way to achieve my goal.

The result should look like this:

index name
0 [11, 12, 13] TNCO
1 [5, 6] FIGO


Note: if the name is not found then, it is not needed to store any information.

Answer

You're looking for the method isin:

df = df2[df2['Name'].isin(nan_names)]

This will return:

    Name
5   FIGO
6   FIGO
11  TNCO
12  TNCO
13  TNCO

From there it's just a matter of formatting:

df.reset_index().groupby('Name')['index'].apply(list)

This will return:

Name
FIGO          [5, 6]
TNCO    [11, 12, 13]

The idea is to reset the index so that it becomes a column (named index). Grouping by name and applying the list function will return the list of original indices for each name.

Calling reset_index once more will return the result you were looking for.


Edit

Combine everything into a one-liner, this will be the output:

In [21]: df2[df2['Name'].isin(nan_names)].reset_index().groupby('Name')['index'].apply(list).reset_index()
Out[21]: 
   Name         index
0  FIGO        [5, 6]
1  TNCO  [11, 12, 13]