Krowar Krowar - 1 month ago 11
R Question

Why is this sapply not working on my data-frame? (titanic kaggle)

I have the data frame from the titanic kaggle and I try to remove the NA values from the age column. To do so, I try the following code

df.train <- read.csv('data/titanic_train.csv')


fixe.age <- function(passenger){
returnedage <- passenger$Age
if(is.na(returnedage)==T){
if(passenger$Pclasse==1){
returnedage <- 37
}
else if(passenger$Plasse == 2){
returnedage <-29
}
else{
returnedage <- 24
}
}
else{
returnedage <- passenger$Age
}
return(returnedage)
}

sapply(df.train, fixe.age)


I receive the following error :


Error in passenger$Age : $ operator is invalid for atomic vectors


Is the way that I'm trying to do this totally wrong ?

Thanks a lot

Answer

It doesn't work because sapply applies a function to all columns of a data frame, and you are trying to apply to rows. To implement what you are suggesting, you need apply(margin = 1).

But the main problem is that you don't need a loop for this, because most functions are vectorized in R (see chap. 3 of The R Inferno). The following code should work:

df.train$returnedage <- df.train$Age
df.train$returnedage[is.na(df.train$Age)] <- 24
df.train$returnedage[is.na(df.train$Age) & passenger$Pclasse==1] <- 37
df.train$returnedage[is.na(df.train$Age) & passenger$Pclasse==2] <- 29
Comments