min min - 1 month ago 6
R Question

Got an error using ifelse inside mutate inside the for loop

I have a list of 244 data frames which looks like the following:
The name of the list is

datas
.

datas[[1]]

year sal
2000 10000
2000 15000
2005 10000
2005 9000
2005 12000
2010 15000
2010 12000
2010 20000
2013 25000
2013 15000
2015 20000


I would like to make a new column called
fix.sal
, multiplying different values for different years. For example, I multiply 2 on
sal
s which are on the same rows with 2000. In the same way, the number multiplied on the
sal
value is 1.8 for 2005, 1.5 for 2010, 1.2 for 2013, 1 for 2015. So the result should be like this:

Year sal fix.sal
2000 10000 20000
2000 15000 30000
2005 10000 18000
2005 9000 16200
2005 12000 21600
2010 15000 22500
2010 12000 18000
2010 20000 30000
2013 25000 30000
2013 15000 18000
2015 20000 20000


I succeeded to do this by using
ifelse
inside
mutate
which for package
dplyr
.

library(dplyr)
datas[[1]]<-mutate(datas[[1]], fix.sal=
ifelse(datas[[1]]$Year==2000,datas[[1]]$sal*2,
ifelse(datas[[1]]$Year==2005,datas[[1]]$sal*1.8,
ifelse(datas[[1]]$Year==2010,datas[[1]]$sal*1.5,
ifelse(datas[[1]]$Year==2013,datas[[1]]$sal*1.2,
datas[[1]]$sal*1)))))


But I have to do this operation to the 244 data frames in the list
datas
.

So I tried to do it using the for loop like this;

for(i in 1:244){
datas[[i]]<-mutate(datas[[i]], fix.sal=
ifelse(datas[[i]]$Year==2000,datas[[i]]$sal*2,
ifelse(datas[[i]]$Year==2005,datas[[i]]$sal*1.8,
ifelse(datas[[i]]$Year==2010,datas[[i]]$sal*1.5,
ifelse(datas[[i]]$Year==2013,datas[[i]]$sal*1.2,
datas[[i]]$sal*1)))))
}


Then there came an error;

Error: invalid subscript type 'integer'


How can I solve this...?

Any comments will be greatly appreciated! :)

Answer

Please don't force yourself to use ifelse for this. Instead, create a vector with your multipliers, then use the year to select from the vector. The vector will look something like this:

multiplier <-
  c("2005" = 1.2
    , "2006" = 1.05
    , "2007" = 0.9)

With whatever your multiplier is for each year in your data. Then, here is some sample data (all the same, but that doesn't matter):

datas <-
  lapply(1:3, function(idx){
    data.frame(
      Year = 2005:2007
      , sal = c(10, 20, 30)
    )
  })

Finally, we can then use lapply to loop through the list more efficiently. Each time through, it uses the Year to pick a value from the multipliers vector (note the use of as.character, otherwise it will pick, e.g., the 2005th entry, instead of the one named "2005").

lapply(datas, function(x){
  mutate(x, fix.sal = sal*multiplier[as.character(Year)])
})

returns:

[[1]]
  Year sal fix.sal
1 2005  10      12
2 2006  20      21
3 2007  30      27

[[2]]
  Year sal fix.sal
1 2005  10      12
2 2006  20      21
3 2007  30      27

[[3]]
  Year sal fix.sal
1 2005  10      12
2 2006  20      21
3 2007  30      27

For more compact code, you can use:

lapply(datas, mutate, fix.sal = sal*multiplier[as.character(Year)])

but that makes it slightly less clear to me what is happening.

Comments