Craig Hamilton Craig Hamilton - 1 year ago 57
R Question

Converting Likert Data to Numeric Across A Data Frame

I have a dataset with 90 responses to Likert Items that I would like to convert to numeric values. It is structured like the example here:

q6 <- c("Daily", "Never", "Often", "Very Often", "Daily")
q7 <- c("Never", "Never", "Often", "Often", "Daily")
q23 <- c("Daily", "Often", "Never", "Never", "Neutral")
q17 <- c("Important", "Important", "Very Important", "Neutral", "Not Important")
example <- cbind(q6, q7, q17, q23)

The responses to each question differ slightly, but are in the main either in the range of Strongly Disagree to Strongly Agree, Daily to Never, or Important to Not Important. Each of the responses to the 90 questions are in a separate column (labelled q1 > q90). I'd like to create new columns for set of responses with a numeric value that corresponds to the text response (Strong Agree (3) to Strongly Disagree (-3), via Neutral (0)). Like so

q6 <- c("Daily", "Never", "Often", "Very Often", "Daily")
n6 <- c(3,-3,1,2,3)
q17 <- c("Important", "Important", "Very Important", "Neutral", "Not Important")
n17 <- c(2,2,3,0,-3)
num_example <- cbind(q6, n6, q17, n17)

I've managed to get so far with the code below, which generates a new variable called n6 that matches the text responses in the existing q6 column, that I can then add to the existing data frame using cbind. My questions is: how would I automate this across the entire data frame of 90 questions without having to run the code below for each response (i.e. changing q6 to q7, then to q8, and so on).

n6 <- ifelse(example$q6=="Daily", 3,
ifelse(h16$q6=="Very Often", 2,
ifelse(h16$q6=="Often", 1,
ifelse(h16$q6=="Neither Rarely nor Often", 0,
ifelse(h16$q6=="Rarely", -1,
ifelse(h16$q6=="Very Rarely", -2,
ifelse(h16$q6=="Never", -3,5

For further reference, columns q6:q12, then q23:30 have responses ranging from Daily to Never, as per the example above. Columns q17:q22 have responses ranging from Not Important to Very Important, Columns q49:q90 have responses that range from Strongly Agree to Strongly Disagree. I'm trying to find a smarter way of running the code below over the relevant columns (e.g. q6:12, q23:q30) in a way that generates a new data frame with numeric values in columns named n6:n16, n23:30, rather than having to run the code below 90 times!

Hope this is a clear explanation of the issue.

Thank you.

Answer Source

There are faster ways but since you already did all of that work, transform your current process into a function then use sapply to go over all columns:

Notice that I changed the q6 to [,x]:

numConvert <- function(x) ifelse(example[,x]=="Daily", 3,
                    ifelse(h16[,x]=="Very Often", 2,
                           ifelse(h16[,x]=="Often", 1,
                                  ifelse(h16[,x]=="Neither Rarely nor Often", 0,
                                         ifelse(h16[,x]=="Rarely", -1,
                                                ifelse(h16[,x]=="Very Rarely", -2,
                                                       ifelse(h16[,x]=="Never", -3,5

Now the function accepts column names and converts based on your specification. Try it out:

h16 <- example
sapply(colnames(example), numConvert)
#      q6 q7 q17 q23
# [1,]  3 -3   5   3
# [2,] -3 -3   5   1
# [3,]  1  1   5  -3
# [4,]  2  1   5  -3
# [5,]  3  3   5   5


If you want to use a shiny new function try case_when available with dplyr >= 0.5.0:

factorise <- function(x) {
  case_when(x %in% c("Daily", "Very Important") ~ 3,
            x %in% c("Very Often", "Important") ~ 2,
            x %in% c("Often") ~ 1,
            x %in% c("Neutral") ~ 0,
            x %in% c("Never", "Not Important") ~ -3)

sapply(example, factorise)
#      q6 q7 q17 q23
# [1,]  3 -3   2   3
# [2,] -3 -3   2   1
# [3,]  1  1   3  -3
# [4,]  2  1   0  -3
# [5,]  3  3  -3   0