Julian Karls Julian Karls - 3 days ago 6
R Question

R, is.na.dataset colnames error

When I use the following data.frame

dataSet <- structure(list(J1 = "foo", J2 = structure(0.1, .Dim = c(1L, 1L
))), .Names = c("J1", "J2"), row.names = 1L, class = "data.frame")


then

print(colnames(dataSet))


returns

[1] "J1" "J2"


as expected.

However,

r <- is.na(dataSet)
print(colnames(r))


returns

[1] "J1" ""


Why is this happing? I create the data.frame in this strange way because I created the code using dput() after condensing a real data.frame to a minimal working example. A function that I am using relies on the assumption that the colnames are kept intact by is.na, which seems to work for most data.frames but not for this one.

Answer

Keep in mind that your second column is an unnamed matrix.

sapply(dataSet, class)
#          J1          J2 
# "character"    "matrix" 

Now let's have a look at what's happening in is.na. The first few lines of the data frame method for is.na are

head(is.na.data.frame, 5)
#                                              
# 1 function (x)                                
# 2 {                                           
# 3     y <- if (length(x)) {                   
# 4        do.call("cbind", lapply(x, "is.na"))
# 5     } 

is.na.data.frame is written in R, so we can easily debug the problem ourselves by plugging our data set into the steps.

lapply(dataSet, is.na)
# $J1
# [1] FALSE
#
# $J2
#       [,1]
# [1,] FALSE

do.call(cbind, lapply(dataSet, is.na))
#         J1      
# [1,] FALSE FALSE

So we know it happens in cbind. Now, if we go to help(cbind), we find

For cbind (rbind) the column (row) names are taken from the colnames (rownames) of the arguments if these are matrix-like.

The argument in question here is the matrix in the second column. The names are taken from that matrix, not the data frame list names. And since there aren't any names on the matrix, the resulting second column name is blank.

A quick solution to this particular problem would be to simply concatenate the second column.

is.na(lapply(dataSet, c))
#    J1    J2 
# FALSE FALSE 
Comments