cherrytree cherrytree - 2 months ago 13
R Question

reshape2 melt warning message

I'm using

melt
and encounter the following warning message:

attributes are not identical across measure variables; they will be dropped


After looking around people have mentioned it is because the variables are different classes; however, that is not the case with my dataset.

Here is the dataset:

test <- structure(list(park = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("miss", "piro", "sacn", "slbe"), class = "factor"),
a1.one = structure(c(3L, 1L, 3L, 3L, 3L, 3L, 1L, 3L, 3L,
3L), .Label = c("agriculture", "beaver", "development", "flooding",
"forest_pathogen", "harvest_00_20", "harvest_30_60", "harvest_70_90",
"none"), class = "factor"), a2.one = structure(c(6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("development",
"forest_pathogen", "harvest_00_20", "harvest_30_60", "harvest_70_90",
"none"), class = "factor"), a3.one = structure(c(3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("forest_pathogen",
"harvest_00_20", "none"), class = "factor"), a1.two = structure(c(3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("agriculture",
"beaver", "development", "flooding", "forest_pathogen", "harvest_00_20",
"harvest_30_60", "harvest_70_90", "none"), class = "factor"),
a2.two = structure(c(6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L), .Label = c("development", "forest_pathogen", "harvest_00_20",
"harvest_30_60", "harvest_70_90", "none"), class = "factor"),
a3.two = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L), .Label = c("forest_pathogen", "harvest_00_20", "none"
), class = "factor")), .Names = c("park", "a1.one", "a2.one",
"a3.one", "a1.two", "a2.two", "a3.two"), row.names = c(NA, 10L
), class = "data.frame")


And here is the structure:

str(test)
'data.frame': 10 obs. of 7 variables:
$ park : Factor w/ 4 levels "miss","piro",..: 1 1 1 1 1 1 1 1 1 1
$ a1.one: Factor w/ 9 levels "agriculture",..: 3 1 3 3 3 3 1 3 3 3
$ a2.one: Factor w/ 6 levels "development",..: 6 6 6 6 6 6 6 6 6 6
$ a3.one: Factor w/ 3 levels "forest_pathogen",..: 3 3 3 3 3 3 3 3 3 3
$ a1.two: Factor w/ 9 levels "agriculture",..: 3 3 3 3 3 3 3 3 3 3
$ a2.two: Factor w/ 6 levels "development",..: 6 6 6 6 6 6 6 6 6 6
$ a3.two: Factor w/ 3 levels "forest_pathogen",..: 3 3 3 3 3 3 3 3 3 3


Is it because the number of levels are different for each variable? So, can I just ignore the warning message in this case?

To generate the warning message:

library(reshape2)
test.m <- melt (test,id.vars=c('park'))
Warning message:
attributes are not identical across measure variables; they will be dropped


Thanks.

Answer

An explanation:

When you melt, you are combining multiple columns into one. In this case, you are combining factor columns, each of which has a levels attribute. These levels are not the same across columns because your factors are actually different. melt just coerces each factor to character and drops their attributes when creating the value column in the result.

In this case the warning doesn't matter, but you need to be very careful when combining columns that are not of the same "type", where "type" does not mean just vector type, but generically the nature of things it refers to. For example, I would not want to melt a column containing speeds in MPH with one containing weights in LBs.

One way to confirm that it is okay to combine your factor columns is to ask yourself whether any possible value in one column would be a reasonable value to have in every other column. If that is the case, then likely the correct thing to do would be to ensure that every factor column has all the possible levels that it could accept (in the same order). If you do this, you will not get a warning when you melt the table.

An illustration:

library(reshape2)
DF <- data.frame(id=1:3, x=letters[1:3], y=rev(letters)[1:3])
str(DF)

The levels for x and y are not the same:

'data.frame':  3 obs. of  3 variables:
$ id: int  1 2 3
$ x : Factor w/ 3 levels "a","b","c": 1 2 3
$ y : Factor w/ 3 levels "x","y","z": 3 2 1

Here we melt and look at the column x and y were molten into (value):

melt(DF, id.vars="id")$value

We get a character vector and a warning:

[1] "a" "b" "c" "z" "y" "x"
Warning message:
attributes are not identical across measure variables; they will be dropped 

If however we reset the factors to have the same levels and only then melt:

DF[2:3] <- lapply(DF[2:3], factor, levels=letters)
melt(DF, id.vars="id", factorsAsStrings=F)$value

We get the correct factor and no warnings:

[1] a b c z y x
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z

The default behavior of melt is to drop factor levels even when they are identical, which is why we use factorsAsStrings=F above. If you had not used that setting you would have gotten a character vector, but no warning. I would argue the default behavior should be to keep the result as a factor, but that is not the case here.