ihsansat ihsansat - 3 months ago 27
Python Question

Change dataframe pandas based one series

I have data and have convert using dataframe pandas :

import pandas as pd
d = [
(1,70399,0.988375133622),
(1,33919,0.981573492596),
(1,62461,0.981426807114),
(579,1,0.983018778374),
(745,1,0.995580488899),
(834,1,0.980942505189)
]
df_new = pd.DataFrame(e, columns=['source_target']).sort_values(['source_target'], ascending=[True])


and i need build series for mapping colum
source
and
target
into another

e = []
for x in d:
e.append(x[0])
e.append(x[1])

e = list(set(e))
df_new = pd.DataFrame(e, columns=['source_target'])

df_new.source_target = (df_new.source_target.diff() != 0).cumsum() - 1
new_ser = pd.Series(df_new.source_target.values, index=new_source_old).drop_duplicates()


so i get series :

source_target
1 0
579 1
745 2
834 3
33919 4
62461 5
70399 6
dtype: int64


i have tried change dataframe
df_beda
based on
new_ser
series using :

df_beda.target = df_beda.target.mask(df_beda.target.isin(new_ser), df_beda.target.map(new_ser)).astype(int)
df_beda.source = df_beda.source.mask(df_beda.source.isin(new_ser), df_beda.source.map(new_ser)).astype(int)


but result is :

source target weight
0 0 70399 0.988375
1 0 33919 0.981573
2 0 62461 0.981427
3 579 0 0.983019
4 745 0 0.995580
5 834 0 0.980943


it's wrong, ideal result is :

source target weight
0 0 6 0.988375
1 0 4 0.981573
2 0 5 0.981427
3 1 0 0.983019
4 2 0 0.995580
5 3 0 0.980943


maybe anyone can help me for show where my mistake

thx

Answer

If the order doesn't matter, you can do the following. Avoid for loop unless it's absolutely necessary.

uniq_vals = np.unique(df_beda[['source','target']])
map_dict = dict(zip(uniq_vals, xrange(len(uniq_vals))))
df_beda[['source','target']] = df_beda[['source','target']].replace(map_dict)

print df_beda

   source  target    weight
0       0       6  0.988375
1       0       4  0.981573
2       0       5  0.981427
3       1       0  0.983019
4       2       0  0.995580
5       3       0  0.980943

If you want to roll back, you can create an inverse map from the original one, because it is guaranteed to be 1-to-1 mapping.

inverse_map = {v:k for k,v in map_dict.iteritems()}
df_beda[['source','target']] = df_beda[['source','target']].replace(inverse_map)
print df_beda

   source  target    weight
0       1   70399  0.988375
1       1   33919  0.981573
2       1   62461  0.981427
3     579       1  0.983019
4     745       1  0.995580
5     834       1  0.980943