user3594490 - 1 month ago 8

R Question

I am trying to create a new index variable from 0-100% from multiple survey items (q1:q15) with a 7pt. response scale. I found an example in Excel that will accomplish this using the Count function but I would like to automate this with R. There are missing data for some of the items.

I also need to specify the following criteria: At least one item from each of the following groups of items (q1:q3,q4:q6,q7:q9, q10:q12) must have data or the Index should result in NA.

`structure(list(q1 = c(6L, 5L, 2L, 7L), q2 = c(5L, 5L, 6L, 7L),`

q3 = c(4L, NA, 2L, 7L), q4 = c(NA, 5L, 5L, 5L), q5 = c(NA,

6L, 6L, 6L), q6 = c(NA, 6L, 2L, 7L), q7 = c(6L, 7L, 7L, 7L

), q8 = c(5L, 6L, NA, 7L), q9 = c(6L, 5L, 7L, 7L), q10 = c(7L,

NA, 5L, 7L), q11 = c(6L, 5L, 7L, 7L), q12 = c(6L, 6L, 2L,

7L), q13 = c(6L, 5L, 6L, 7L), q14 = c(4L, 4L, 1L, 7L), q15 = c(6L,

7L, 4L, 7L)), .Names = c("q1", "q2", "q3", "q4", "q5", "q6",

"q7", "q8", "q9", "q10", "q11", "q12", "q13", "q14", "q15"), class = "data.frame", row.names = c(NA,

-4L))

The coding works in Excel (except for the criteria about non-missing data in the groups of items):

`=(SUM(q1:q15)-COUNT(q1:q15))/((COUNT(q1:q15)*7)-COUNT(q1:q15))*100`

Thanks for assistance.

The calculation for the Index above is [the sum for items q1:q15 minus the minimum possible sum (since my scale is 1-7 this would be the number of non-missing items * 1)] divided by [the maximum possible sum (on a scale of 1-7 the number of non-missing items * 7) - the minimum possible sum]. The output I'm trying to generate adds a new variable "Index" with a value in that colum for each row:

`structure(list(q1 = c(6L, 5L, 2L, 7L), q2 = c(5L, 5L, 6L, 7L),`

q3 = c(4L, NA, 2L, 7L), q4 = c(NA, 5L, 5L, 5L), q5 = c(NA,

6L, 6L, 6L), q6 = c(NA, 6L, 2L, 7L), q7 = c(6L, 7L, 7L, 7L

), q8 = c(5L, 6L, NA, 7L), q9 = c(6L, 5L, 7L, 7L), q10 = c(7L,

NA, 5L, 7L), q11 = c(6L, 5L, 7L, 7L), q12 = c(6L, 6L, 2L,

7L), q13 = c(6L, 5L, 6L, 7L), q14 = c(4L, 4L, 1L, 7L), q15 = c(6L,

7L, 4L, 7L), Index = c(NA, 75.64102564, 57.14285714, 96.66666667

)), .Names = c("q1", "q2", "q3", "q4", "q5", "q6", "q7",

"q8", "q9", "q10", "q11", "q12", "q13", "q14", "q15", "Index"

), class = "data.frame", row.names = c(NA, -4L))

Answer

Something like this should work:

Let's define a function to work on a single row.
I first check the inputs by putting the row into a 3-row matrix and summing the NA values in each column. If any of the columns have 3 missing values, we return `NA`

as per your input check.

Then I attempted to simplify your formula. I believe `SUM(q1:q15)-COUNT(q1:q15)`

can be simplified to `sum(x - 1)`

and `COUNT(q1:q15)*7-COUNT(q1:q15)`

is `COUNT(q1:q15)*6`

is `sum((!is.na(x)) * 6)`

.

```
f = function(x) {
if (any(colSums(matrix(is.na(x), nrow = 3)) == 3)) return(NA)
sum(x - 1, na.rm = T) / sum((!is.na(x)) * 6) * 100
}
apply(df, 1, f)
# [1] NA 75.64103 57.14286 96.66667
```

**Edit**

To add this to the original data as a column, just assign it:

```
res = df
res$Index = apply(df, 1, f)
res
# q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14 q15 Index
# 1 6 5 4 NA NA NA 6 5 6 7 6 6 6 4 6 NA
# 2 5 5 NA 5 6 6 7 6 5 NA 5 6 5 4 7 75.64103
# 3 2 6 2 5 6 2 7 NA 7 5 7 2 6 1 4 57.14286
# 4 7 7 7 5 6 7 7 7 7 7 7 7 7 7 7 96.66667
```

Thinking about this a little more, it's like your taking an average of `(x - 1) / 6`

and then multiplying by 100. The following function is even simpler and yields the same result:

```
f2 = function(x) {
if (any(colSums(matrix(is.na(x), nrow = 3)) == 3)) return(NA)
mean((x - 1) / 6, na.rm = T) * 100
}
res$Index = apply(df, 1, f2)
```

**Edit:** This average is a much nicer way to put your very round-about description:

The calculation for the Index above is [the sum for items q1:q15 minus the minimum possible sum (since my scale is 1-7 this would be the number of non-missing items * 1)] divided by [the maximum possible sum (on a scale of 1-7 the number of non-missing items * 7) - the minimum possible sum]. The output I'm trying to generate adds a new variable "Index" with a value in that colum for each row:

So basically, you are scoring things from 1-7 and you want the average percent correct. First we subtract off 1 to make the scores 0-6 (because 0 is a nicer minimum than 1), then we divide by 6 to make the scores decimal between 0-1, and then we average, ignoring missing values of course. Multiply by 100 to convert to percentage. This is the code above:

```
mean((x - 1) / 6, na.rm = T) * 100
```