jbehrens94 jbehrens94 - 1 month ago 8
R Question

R remove NA value from factor in split function

I'm using the split function to group my data.frame into three categories (C, Q or S). Now, when I execute the split function, I notice that there are now 4 lists in the variable (C, Q, S and empty string).

I expect this has to do with an NA value, or an empty string. How do I filter this correctly?
Currently, my code looks like this:

# Read the data from the CSV file.
train.csv <- read.csv("train.csv")

# Create some handy variables
ship.embarked <- split(train.csv, train.csv$Embarked)
ship.pclass <- split(train.csv, train.csv$Pclass)


ship.embarked
returns 4 lists (C, Q S and empty string), while I expect to have 3 (C, Q and S). How do I solve this correctly?

Answer

If we need to remove the "", convert to character, use nzchar to return a logical vector, subset the rows based on that and remove the unused levels with droplevels

train.csv <- droplevels(train.csv[nzchar(as.character(train.csv$Embarked)‌​),])

Now, we can do the split and there won't be any ""