ElTinusDeluxe ElTinusDeluxe - 2 months ago 14
R Question

Creating syntactically valid names from a factor in R while retaining levels

I am making a bioinformatics shiny app that reads user-supplied group names from an excel file. As these names can be non-sytactically valid names, I would like to represent them internally as valid names.

As an example, I can have this input:

(grps <- as.factor(c("T=0","T=0","T=4-","T=4+","T=4+")))
[1] T=0 T=0 T=4- T=4+ T=4+
Levels: T=0 T=4- T=4+


Ideally, I would like R to make valid names, but keep the groups/levels the same, for instance the following would be fine:
"T.0" "T.0" "T.4minus" "T.4plus" "T.4plus"

When using make.names() however, all non-valid characters are converted to the same charater:

(grps2 <- as.factor(make.names(grps)))
[1] T.0 T.0 T.4. T.4. T.4.
Levels: T.0 T.4.


So both T=4- and T=4+ are given the same name and a level is lost (which causes problems in subsequent analyses). Also, setting unique=TRUE does not solve the problem, because

(grps3 <- as.factor(make.names(grps,unique=TRUE)))
[1] T.0 T.0.1 T.4. T.4..1 T.4..2
Levels: T.0 T.0.1 T.4. T.4..1 T.4..2


and group T=4+ is split into 2 different groups and levels are gained.

Does anybody know how it is possible in general to make a factor into valid names, while keeping the same levels?
Please keep in mind that user input can widely vary, so manually replacing "-" with "minus" does not work here.

Thanks in advance for your help!

Answer

With the mapvalues function from plyr you can do:

require("plyr")
mapvalues(grps, levels(grps), make.names(levels(grps), unique=TRUE))

Since this works directly on the levels instead of the factor, the number of the values stays the same.

Comments