user1017373 user1017373 - 4 months ago 9
Python Question

Renaming certain multiple values in a column of dataframe into another single value

I have a data Frame, which is 1 GB in size, the following is a dummy one

df <- data.frame(group=rep(c("A", "B", "C","D","E","F","G","H"), each=4),height=sample(100:150, 16))
df
group height
1 A 105
2 A 119
3 B 108
4 B 114
5 C 109
6 C 111
7 D 148
8 D 121
9 E 133
10 E 101
11 F 143
12 F 135
13 G 147
14 G 141
15 H 150
16 H 145


And What I am aiming is to change the names of the column group like say for example all B, H, and G into NC and all A into PC, and others into NON
and so I tried the following one-liner.

de=c("B")
df =df$group[df$group %in% de,]<-"NC"


But it's throwing the following error,

Error in `[<-.factor`(`*tmp*`, df$group %in% de, , value = "nc") :
incorrect number of subscripts on matrix
In addition: Warning message:
In `[<-.factor`(`*tmp*`, df$group %in% de, , value = "nc") :
invalid factor level, NA generated


In the end, the data frame df should look like this

df
group height
1 PC 105
2 PC 119
3 NC 108
4 NC 114
5 NON 109
6 NON 111
7 NON 148
8 NON 121
9 NON 133
10 NON 101
11 NON 143
12 NON 135
13 NC 147
14 NC 141
15 NC 150
16 NC 145


Any suggestion in R or pandas would be really great.
Thank you

Answer

In R you can try:

Transform to character first and then replace the value directly.

df$group <- as.character(df$group); 
df$group[df$group %in% c("B")] <- "NC"

Edit:

As you updated your question you can try ifelse. Of course you can also overwrite the group column by this approach.

df$group2 <- ifelse( df$group %in% c("B", "H", "G"), "NC", ifelse(df$group %in% c("A"), "PC", "NON"))
head(df, 10)
   group height group2
1      A    139     PC
2      A    114     PC
3      A    132     PC
4      A    141     PC
5      B    107     NC
6      B    101     NC
7      B    122     NC
8      B    129     NC
9      C    100    NON
10     C    108    NON
Comments