jonas jonas - 1 month ago 5
Python Question

python pandas replacing strings in dataframe with numbers

Is there anyway to use the mapping function or something better to replace values in an entire dataframe?

I only know how to perform the mapping on series.

I would like to replace the strings in the 'tesst' and 'set' column with a number
for example set = 1, test =2

Here is a example of my dataset: (Original dataset is very large)

ds_r
respondent brand engine country aware aware_2 aware_3 age tesst set
0 a volvo p swe 1 0 1 23 set set
1 b volvo None swe 0 0 1 45 set set
2 c bmw p us 0 0 1 56 test test
3 d bmw p us 0 1 1 43 test test
4 e bmw d germany 1 0 1 34 set set
5 f audi d germany 1 0 1 59 set set
6 g volvo d swe 1 0 0 65 test set
7 h audi d swe 1 0 0 78 test set
8 i volvo d us 1 1 1 32 set set


Final result should be

ds_r
respondent brand engine country aware aware_2 aware_3 age tesst set
0 a volvo p swe 1 0 1 23 1 1
1 b volvo None swe 0 0 1 45 1 1
2 c bmw p us 0 0 1 56 2 2
3 d bmw p us 0 1 1 43 2 2
4 e bmw d germany 1 0 1 34 1 1
5 f audi d germany 1 0 1 59 1 1
6 g volvo d swe 1 0 0 65 2 1
7 h audi d swe 1 0 0 78 2 1
8 i volvo d us 1 1 1 32 1 1


grateful for advise,

Answer

What about DataFrame.replace?

In [9]: mapping = {'set': 1, 'test': 2}

In [10]: df.replace({'set': mapping, 'tesst': mapping})
Out[10]: 
   Unnamed: 0 respondent  brand engine  country  aware  aware_2  aware_3  age  \
0           0          a  volvo      p      swe      1        0        1   23   
1           1          b  volvo   None      swe      0        0        1   45   
2           2          c    bmw      p       us      0        0        1   56   
3           3          d    bmw      p       us      0        1        1   43   
4           4          e    bmw      d  germany      1        0        1   34   
5           5          f   audi      d  germany      1        0        1   59   
6           6          g  volvo      d      swe      1        0        0   65   
7           7          h   audi      d      swe      1        0        0   78   
8           8          i  volvo      d       us      1        1        1   32   

  tesst set  
0     2   1  
1     1   2  
2     2   1  
3     1   2  
4     2   1  
5     1   2  
6     2   1  
7     1   2  
8     2   1  

As @Jeff pointed out in the comments, in pandas versions < 0.11.1, manually tack .convert_objects() onto the end to properly convert tesst and set to int64 columns, in case that matters in subsequent operations.

Comments