solomo31 solomo31 - 1 month ago 9
R Question

Find Frequency for Multiple Variables/Columns Based on Specific Condition R

I'm pretty new to R and I'm trying to figure out how to write code to get the frequency for multiple columns based on different conditions.

Example Data

ID Group Age Gender Total_T Neg_Mood_T Interpersonal_Prob_T
6000-01-00 0 9 1 44.00 49.00 42.00 44.00 48.00 40.00
6000-02-00 0 12 1 53.00 54.00 42.00 59.00 52.00 51.00
6000-03-00 0 7 2 72.00 50.00 56.00 58.00 81.00 84.00
6000-04-00 0 7 1 41.00 44.00 49.00 47.00 41.00 40.00
6000-05-00 0 9.5 1 38.00 44.00 42.00 39.00 41.00 40.00
6000-06-00 1 8 1 39.00 38.00 57.00 39.00 41.00 40.00
6000-07-00 1 9 1 38.00 44.00 42.00 39.00 41.00 40.00
6000-08-00 1 18 1 41.00 44.00 44.00 48.00 41.00 40.00
6000-09-00 1 9 2 58.00 54.00 45.00 47.00 69.00 56.00
6000-10-00 1 11 2 42.00 40.00 45.00 47.00 46.00 40.00


So, I began with a simple code to figure out the frequency of what occurs in a variable based on some condition in this code:

condition 1:

Total_T <- sum(data$Total_T[data$Group==0]>=60, na.rm=TRUE)


condition 1:

Total_T <- sum(data$Total_T[data$Group==0]<60, na.rm=TRUE)


However, I need to repeat this code a bunch more times for different variables and different conditions (i.e. condition 1 would be repeated for 4 more variables as would condition 2 and so forth) and I would like to figure out how to make it more efficient.

So, I'm hoping to create a code that will return the frequency of Total_T, Neg_Mood_T etc based on the conditions I place on Group, Age and Gender.

I've tried to use
data.frame(table())
,
ddply
, but I'm honestly stumped.

Thanks !

Answer

We can use subset to get the part of the data we need, then sum:

x1 <- subset(data, Group== 0 & Gender == 1, select="Total_T")
sum(x1[x1 >= 60], na.rm=TRUE)
sum(x1[x1 < 60], na.rm=TRUE)

#Wrapped in a function
fun <- function(cols) {
  x1 <- subset(data, Group== 0 & Gender == 1, select=cols)
  sum(x1[x1 >= 60], na.rm=TRUE)
}  

fun("Total_T")
[1] 176
fun("Neg_Mood_T")
[1] 191

If you would like to get all the columns in one shot, you can use:

library(dplyr)
data %>% filter(Group == 0 & Gender == 1) %>%
  summarise_at(-(1:4), funs(sum(.[. < 60])))
# Total_T Neg_Mood_T Interpersonal_Prob_T col7 col8 col9
# 1     176        191                  175  189  182  171

Edit

There is a difference between summing the values of Total_T that fit the conditions and summing the number of times a value fits the description. We can show with an example:

x <- 1:10

#condition
x > 5

#1. sum values fitting the condition
sum(x[x > 5])
[1] 40

#2. sum number of times a value fits condition
sum(x > 5)
[1] 4
Comments