Krowar Krowar - 4 months ago 35
R Question

Why is this sapply not working on my data-frame? (titanic kaggle)

I have the data frame from the titanic kaggle and I try to remove the NA values from the age column. To do so, I try the following code

df.train <- read.csv('data/titanic_train.csv')

fixe.age <- function(passenger){
returnedage <- passenger$Age
returnedage <- 37
else if(passenger$Plasse == 2){
returnedage <-29
returnedage <- 24
returnedage <- passenger$Age

sapply(df.train, fixe.age)

I receive the following error :

Error in passenger$Age : $ operator is invalid for atomic vectors

Is the way that I'm trying to do this totally wrong ?

Thanks a lot


It doesn't work because sapply applies a function to all columns of a data frame, and you are trying to apply to rows. To implement what you are suggesting, you need apply(margin = 1).

But the main problem is that you don't need a loop for this, because most functions are vectorized in R (see chap. 3 of The R Inferno). The following code should work:

df.train$returnedage <- df.train$Age
df.train$returnedage[$Age)] <- 24
df.train$returnedage[$Age) & passenger$Pclasse==1] <- 37
df.train$returnedage[$Age) & passenger$Pclasse==2] <- 29