jonas jonas - 3 months ago 12
Python Question

python pandas replacing strings in dataframe with numbers

Is there anyway to use the mapping function or something better to replace values in an entire dataframe?

I only know how to perform the mapping on series.

I would like to replace the strings in the 'tesst' and 'set' column with a number
for example set = 1, test =2

Here is a example of my dataset: (Original dataset is very large)

ds_r
respondent brand engine country aware aware_2 aware_3 age tesst set
0 a volvo p swe 1 0 1 23 set set
1 b volvo None swe 0 0 1 45 set set
2 c bmw p us 0 0 1 56 test test
3 d bmw p us 0 1 1 43 test test
4 e bmw d germany 1 0 1 34 set set
5 f audi d germany 1 0 1 59 set set
6 g volvo d swe 1 0 0 65 test set
7 h audi d swe 1 0 0 78 test set
8 i volvo d us 1 1 1 32 set set


Final result should be

ds_r
respondent brand engine country aware aware_2 aware_3 age tesst set
0 a volvo p swe 1 0 1 23 1 1
1 b volvo None swe 0 0 1 45 1 1
2 c bmw p us 0 0 1 56 2 2
3 d bmw p us 0 1 1 43 2 2
4 e bmw d germany 1 0 1 34 1 1
5 f audi d germany 1 0 1 59 1 1
6 g volvo d swe 1 0 0 65 2 1
7 h audi d swe 1 0 0 78 2 1
8 i volvo d us 1 1 1 32 1 1


grateful for advise,

Answer

What about DataFrame.replace?

In [9]: mapping = {'set': 1, 'test': 2}

In [10]: df.replace({'set': mapping, 'tesst': mapping})
Out[10]: 
   Unnamed: 0 respondent  brand engine  country  aware  aware_2  aware_3  age  \
0           0          a  volvo      p      swe      1        0        1   23   
1           1          b  volvo   None      swe      0        0        1   45   
2           2          c    bmw      p       us      0        0        1   56   
3           3          d    bmw      p       us      0        1        1   43   
4           4          e    bmw      d  germany      1        0        1   34   
5           5          f   audi      d  germany      1        0        1   59   
6           6          g  volvo      d      swe      1        0        0   65   
7           7          h   audi      d      swe      1        0        0   78   
8           8          i  volvo      d       us      1        1        1   32   

  tesst set  
0     2   1  
1     1   2  
2     2   1  
3     1   2  
4     2   1  
5     1   2  
6     2   1  
7     1   2  
8     2   1  

As @Jeff pointed out in the comments, in pandas versions < 0.11.1, manually tack .convert_objects() onto the end to properly convert tesst and set to int64 columns, in case that matters in subsequent operations.