James Anthony Perez James Anthony Perez - 3 years ago 141
R Question

Why does order() in R generate NAs when passing in a subsetted dataframe?

Having a little trouble understanding what is going on here, it appears to me that both methods for ordering the data frame below are equivalent.

Our dataframe,

cols <- c("chr","id","value")
df <- data.frame(c(1:5),c("ENSG1","ENSG2","ENSG3","ENSG4","ENSG5"),runif(5,5.0,10.0))
names(df) <- cols
df <- df[sample(nrow(df)),]
df

chr id value
5 ENSG5 8.913645
2 ENSG2 6.117744
4 ENSG4 8.558403
3 ENSG3 9.625546
1 ENSG1 6.105577


Now, method 1:

df[order(df[,c("chr","id")]),]

chr id value
1 ENSG1 6.105577
2 ENSG2 6.117744
3 ENSG3 9.625546
4 ENSG4 8.558403
5 ENSG5 8.913645
NA <NA> NA
NA <NA> NA
NA <NA> NA
NA <NA> NA
NA <NA> NA


Which throws in NAs for some curious reason, while passing in df columns to
order()
as in,

method 2:

df[order(df$chr,df$id),]

chr id value
1 ENSG1 6.105577
2 ENSG2 6.117744
3 ENSG3 9.625546
4 ENSG4 8.558403
5 ENSG5 8.913645


alternatively does not.

Can someone explain why method 1 and method 2 are not interchangeable?

Answer Source

When we look at ?order, it's first arguments are documented as:

a sequence of numeric, complex, character or logical vectors, all of the same length, or a classed R object.

Nothing there really suggests that it would work on a data frame. A "classed R object" is a bit vague, and suggests that a data frame won't throw an error, but it certainly doesn't say "or a data frame".

The Description says:

See the examples for how to use these functions to sort data frames, etc.

When you call order or a data frame, you can see what happens:

order(data.frame(a = 1:5, b = 5:1))
# [1]  1 10  2  9  3  8  4  7  5  6

It looks like it coerces the data frame to a vector, and orders it. Not generally very useful. This is why when you run df[order(df[,c("chr","id")]),] you get the NA rows. Your input data frame had 2 columns hence the order() output had twice as many rows as the data frame.

You have already found the correct way to order a data frame, which is to give actual vectors to order. The vectors can be individual columns of your data frame or they can be other vectors of the correct length.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download