Sreejith Menon Sreejith Menon - 5 months ago 8
Python Question

Manipulating a DataFrame created from a dict(tuple-float)

I have been trying to create a DataFrame from a dictionary that has the below structure.

imgAlbmShrDict = {('10', 'photo_album_57'): 20.0,
('10', 'photo_album_8'): 20.0,
('1061', 'photo_album_29'): 100.0,
('1061', 'photo_album_90'): 90.0,
('1102', 'photo_album_29'): 80.0,
('1102', 'photo_album_90'): 60.0,
('1300', 'photo_album_15'): 100.0,
('1300', 'photo_album_89'): 60.0,
('1301', 'photo_album_15'): 88.88888888888889,
('1301', 'photo_album_89'): 60.0
}


pd.DataFrame(imgAlbmShrDict,index=['Proportion']).transpose()


Proportion
10 photo_album_57 20.000000
photo_album_8 20.000000
1061 photo_album_29 100.000000
photo_album_90 90.000000
1102 photo_album_29 80.000000
photo_album_90 60.000000
1300 photo_album_15 100.000000
photo_album_89 60.000000
1301 photo_album_15 88.888889
photo_album_89 60.000000


The output is exactly how I need, but I am unable to extract only the the first two columns from the data frame.
The first column actually is an image ID and the second column is a album in which the image appeared.

I need help in accessing the columns and a way to add a column whilst preserving a structure.

Required output:

Proportion URL
10 photo_album_57 20.000000 www.something.com/10.jpeg
photo_album_8 20.000000
1061 photo_album_29 100.000000 www.something.com/1061.jpeg
photo_album_90 90.000000
1102 photo_album_29 80.000000 www.something.com/1102.jpeg
photo_album_90 60.000000
1300 photo_album_15 100.000000 www.something.com/1300.jpeg
photo_album_89 60.000000
1301 photo_album_15 88.888889 www.something.com/1301.jpeg
photo_album_89 60.000000

Answer

You can use get_level_values, because first two columns is Multiindex:

print (df.index.get_level_values(0))
Index(['10', '10', '1061', '1061', '1102', '1102', '1300', '1300', '1301',
       '1301'],
      dtype='object')

df['URL'] = 'www.something.com/' + df.index.get_level_values(0) + '.jpg'
print (df) 
                     Proportion                         URL
10   photo_album_57   20.000000    www.something.com/10.jpg
     photo_album_8    20.000000    www.something.com/10.jpg
1061 photo_album_29  100.000000  www.something.com/1061.jpg
     photo_album_90   90.000000  www.something.com/1061.jpg
1102 photo_album_29   80.000000  www.something.com/1102.jpg
     photo_album_90   60.000000  www.something.com/1102.jpg
1300 photo_album_15  100.000000  www.something.com/1300.jpg
     photo_album_89   60.000000  www.something.com/1300.jpg
1301 photo_album_15   88.888889  www.something.com/1301.jpg
     photo_album_89   60.000000  www.something.com/1301.jpg

Maybe need drop_duplicates:

df = df.drop_duplicates(subset='URL')
print (df) 
                     Proportion                         URL
10   photo_album_57   20.000000    www.something.com/10.jpg
1061 photo_album_29  100.000000  www.something.com/1061.jpg
1102 photo_album_29   80.000000  www.something.com/1102.jpg
1300 photo_album_15  100.000000  www.something.com/1300.jpg
1301 photo_album_15   88.888889  www.something.com/1301.jpg

Another solution with reset_index and set column names:

df.reset_index(inplace=True)
df.columns = ['ID','Album','Proportion']
df['URL'] = 'www.something.com/' + df['ID'] + '.jpg'
print (df)
     ID           Album  Proportion                         URL
0    10  photo_album_57   20.000000    www.something.com/10.jpg
1    10   photo_album_8   20.000000    www.something.com/10.jpg
2  1061  photo_album_29  100.000000  www.something.com/1061.jpg
3  1061  photo_album_90   90.000000  www.something.com/1061.jpg
4  1102  photo_album_29   80.000000  www.something.com/1102.jpg
5  1102  photo_album_90   60.000000  www.something.com/1102.jpg
6  1300  photo_album_15  100.000000  www.something.com/1300.jpg
7  1300  photo_album_89   60.000000  www.something.com/1300.jpg
8  1301  photo_album_15   88.888889  www.something.com/1301.jpg
9  1301  photo_album_89   60.000000  www.something.com/1301.jpg

EDIT1:

Thank you stephen for solution.

I try make it better by boolean indexing with Index.duplicated:

mask = ~df.index.get_level_values(0).duplicated()
print (mask)
[ True False  True False  True False  True False  True False]

subindex = df.index[mask]

df.loc[subindex, 'URL'] = 'www.something.com/' + subindex.get_level_values(0) + '.jpg'
df.URL.fillna('', inplace=True)
print (df)
                     Proportion                         URL
10   photo_album_57   20.000000    www.something.com/10.jpg
     photo_album_8    20.000000                            
1061 photo_album_29  100.000000  www.something.com/1061.jpg
     photo_album_90   90.000000                            
1102 photo_album_29   80.000000  www.something.com/1102.jpg
     photo_album_90   60.000000                            
1300 photo_album_15  100.000000  www.something.com/1300.jpg
     photo_album_89   60.000000                            
1301 photo_album_15   88.888889  www.something.com/1301.jpg
     photo_album_89   60.000000          
Comments