Recoding variables in R, seems to be my biggest headache. What functions, packages, processes do you use to ensure the best result?
I've found very few useful examples on the Internet that give a one-size-fits-all solution to recoding and I'm interested to see what you guys and gals are using.
Note: This may be a community wiki topic.
Recoding can mean a lot of things, and is fundamentally complicated.
Changing the levels of a factor can be done using the
> #change the levels of a factor > levels(veteran$celltype) <- c("s","sc","a","l")
Transforming a continuous variable simply involves the application of a vectorized function:
mtcars$mpg.log <- log(mtcars$mpg)
For binning continuous data look at
cut2 (in the hmisc package). For example:
> #make 4 groups with equal sample sizes > mtcars[['mpg.tr']] <- cut2(mtcars[['mpg']], g=4) > #make 4 groups with equal bin width > mtcars[['mpg.tr2']] <- cut(mtcars[['mpg']],4, include.lowest=TRUE)
For recoding continuous or factor variables into a categorical variable there is
recode in the car package and
recode.variables in the Deducer package
> mtcars[c("mpg.tr2")] <- recode.variables(mtcars[c("mpg")] , "Lo:14 -> 'low';14:24 -> 'mid';else -> 'high';")
If you are looking for a GUI, Deducer implements recoding with the Transform and Recode dialogs: