Justas Mundeikis Justas Mundeikis - 3 years ago 120
R Question

Dataframe subsetting by list, not recognising "NA" values

I have the following issue: I import data from a csv. The imported csv looks like this

df <- data.frame(x=c(1,2,3,4,5), y=c("K","M",NA,NA,"K"))


Where K denotes 1 000 and M 1 000 000. I would like to create a new column with dplyr so that I use a list to subset K and M and multiply with values in x column

sul <- c("K"=1000, "M"=1000000, "NA"=1)


So using dplyr:

df %>% mutate(result=x * sul[y])


My problem is though, that that results from importing data from a csv are not being recognized in
sul[y]
and I get either NA or NULL. Have you an idea how to solve this problem in an elegant way? Is there a better way then running:

df$y[is.na(df&y)]<-1


Thanks a lot!

p.s. subsetting by a list is chosen instead of for-loop to increase the speed of processing the data.

Answer Source

It may be better to replace NA with 'Other' and then do

 sul <- c(K=1000, M=1000000, Other=1)
 df %>%
    mutate(y1 = replace(as.character(y), is.na(y), "Other"),
           result = x*sul[y1]) %>%
    select(-y1)
#  x    y  result
#1 1    K    1000
#2 2    M 2000000
#3 3 <NA>       3
#4 4 <NA>       4
#5 5    K    5000

The 'NA' in sul is a character string and not a real NA. So, if we are using the 'sul' from OP's post, replace the 'NA' in 'y' to "NA"

df %>%
      mutate(result = x*sul[replace(as.character(y), is.na(y), "NA")])
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download