Giorgos K Giorgos K - 20 days ago 6
R Question

Delete NA values from a data frame with R

I have a large scale data frame with ?_? values which dimensions are 501 rows and 42844 columns. Using R , i have already replaced them with NA by using this code below :

data[data == "?_?"] <- NA


So i have NA values now and I want to omit these from the Data.frame but something is going bad....
When I hit the command below :

data_na_rm <- na.omit(data)


I get a 0 , 42844 object as a result.

dim(data_na_rm) #gives me 0 42844
data_na_rm[1,2] #gives me NA
data_na_rm[5,3] #gives me NA
############################
data_na_rm[2] #gives me the title of the second column
data_na_rm[5] #gives me the title fo the fifth


What i have to do?? I've spend on this thing to many hours. I would appreciate if anyone could spend some time for this in order to help me.

Answer

Like what JackStat said in the comments, you might have NAs in every row. Maybe you should test for that?:

    # Some Data. All rows have an NA but not all columns

    df <- data.frame(col1 = c(NA, 2, 3, 4),
             col2 = c(1, NA, 3, 4),
             col3 = c(1, 2, NA, 4),
             col4 = c(1, 2, 3, NA),
             col5 = c(1, 2, 3, 4))

# test whether an NA is present in each row

apply(df, 1, function(x) {sum(is.na(x)) > 0})
[1] TRUE TRUE TRUE TRUE

This will help you find which columns are contributing the most NAs. It sums up the number of NAs:

apply(df, 2, function(x) {sum(is.na(x))})
col1 col2 col3 col4 col5 
   1    1    1    1    0