ldevyataykina ldevyataykina - 3 months ago 15
Python Question

Pandas: replace values in strings

I have data frame and I try to replace it from other df

I use:

df['term_code'] = df.search_term.map(rep_term.set_index('search_term')['code_action'])


But I get an error:

File "C:/Users/����� �����������/Desktop/projects/find_time_before_buy/graph (2).py", line 36, in <module>
df['term_code'] = df.search_term.map(rep_term.set_index('search_term')['code_action'])
File "C:\Python27\lib\site-packages\pandas\core\series.py", line 2101, in map
indexer = arg.index.get_indexer(values)
File "C:\Python27\lib\site-packages\pandas\indexes\base.py", line 2082, in get_indexer
raise InvalidIndexError('Reindexing only valid with uniquely'
pandas.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects


What should I change?
Where
search_term
is

729948 None
729949 None
729950 None
729951 пансионат джемете отдых 2016 цены
729952 None
729953 None
729954 купить телефон
729955 None
729956 вк
729957 None
729958 яндекс


And
rep_term
looks like

search_term code_action
авито 6
вк 9
яндекс 12
мтс 7
связной 8
ситилинк 8

Answer

There is problem with duplicates in DataFrame rep_term column search_term.

I simulate it:

import pandas as pd

df = pd.DataFrame({'search_term':[1,2,3]})

print (df)
   search_term
0            1
1            2
2            3

For value 1 in search_term you have 2 values in code_action:

rep_term = pd.DataFrame({'search_term':[1,2,1], 'code_action':['ss','dd','gg']})
print (rep_term)
  code_action  search_term
0          ss            1
1          dd            2
2          gg            1


df['term_code'] = df.search_term.map(rep_term.set_index('search_term')['code_action'])
print (df)
#InvalidIndexError: Reindexing only valid with uniquely valued Index objects

So first identify rows where are duplicated vaues by duplicated:

print (rep_term[rep_term.duplicated(subset=['search_term'], keep=False)])
  code_action  search_term
0          ss            1
2          gg            1

Then you can drop duplicity with keeping last or first values by drop_duplicates

rep_term1 = rep_term.drop_duplicates(subset=['search_term'], keep='first')
print (rep_term1)
  code_action  search_term
0          ss            1
1          dd            2

rep_term2 = rep_term.drop_duplicates(subset=['search_term'], keep='last')
print (rep_term2)
  code_action  search_term
1          dd            2
2          gg            1