moondra moondra - 3 months ago 15
Python Question

Changing data in a DataFrame column (Pandas) with a For loop

I'm trying to take the data from "Mathscore" and convert the values into numerical values, all under "Mathscore."

strong =1
Weak = 0

I tried doing this via the function below using For loop but I can't get the code to run. Is the way I'm trying to assign data incorrect?

Thanks!

import pandas as pd

data = {'Id_Student' : [1,2,3,4,5,6,7,8,9,10],'Mathscore' :['Strong','Weak','Weak','Strong','Strong','Weak','Strong','Strong','Weak','Strong']}

df = pd.DataFrame(data)
df

# # Strong = 1 and Weak =0

##def tran_mathscore(x): if x == 'Strong': return 1 if x == 'Weak': return 0
##
##df['Trans_MathScore'] = df['Mathscore'].apply(tran_mathscore)
##df


##df.Mathscore[0]=["Weak"]

##print(df.columns)
##
##
##print(df.Mathscore)

def tran_mathscore():
for i in df.Mathscore:
if i == "Strong":
df.Mathscore[i]= ['1']

elif i == "Weak":
df.Mathscore[i]= ['0']


tran_mathscore()

Answer

you can categorize your data:

In [23]: df['Mathscore'] = df.Mathscore.astype('category').cat.rename_categories(['1','0'])

In [24]: df
Out[24]:
   Id_Student Mathscore
0           1         1
1           2         0
2           3         0
3           4         1
4           5         1
5           6         0
6           7         1
7           8         1
8           9         0
9          10         1

In [25]: df.dtypes
Out[25]:
Id_Student       int64
Mathscore     category
dtype: object

or map it:

In [27]: df
Out[27]:
   Id_Student Mathscore
0           1    Strong
1           2      Weak
2           3      Weak
3           4    Strong
4           5    Strong
5           6      Weak
6           7    Strong
7           8    Strong
8           9      Weak
9          10    Strong

In [28]: df.Mathscore.map(d)
Out[28]:
0    1
1    0
2    0
3    1
4    1
5    0
6    1
7    1
8    0
9    1
Name: Mathscore, dtype: int64

In [29]: d
Out[29]: {'Strong': 1, 'Weak': 0}

In [30]: df['Mathscore'] = df.Mathscore.map(d)

In [31]: df
Out[31]:
   Id_Student  Mathscore
0           1          1
1           2          0
2           3          0
3           4          1
4           5          1
5           6          0
6           7          1
7           8          1
8           9          0
9          10          1

In [32]: df.dtypes
Out[32]:
Id_Student    int64
Mathscore     int64
dtype: object

PS i prefer the first option as categorical dtype uses much less memory