solomo31 - 1 year ago 109
R Question

Find Frequency for Multiple Variables/Columns Based on Specific Condition R

I'm pretty new to R and I'm trying to figure out how to write code to get the frequency for multiple columns based on different conditions.

Example Data

``````ID        Group Age Gender Total_T  Neg_Mood_T  Interpersonal_Prob_T
6000-01-00  0   9   1   44.00   49.00   42.00   44.00   48.00   40.00
6000-02-00  0   12  1   53.00   54.00   42.00   59.00   52.00   51.00
6000-03-00  0   7   2   72.00   50.00   56.00   58.00   81.00   84.00
6000-04-00  0   7   1   41.00   44.00   49.00   47.00   41.00   40.00
6000-05-00  0   9.5 1   38.00   44.00   42.00   39.00   41.00   40.00
6000-06-00  1   8   1   39.00   38.00   57.00   39.00   41.00   40.00
6000-07-00  1   9   1   38.00   44.00   42.00   39.00   41.00   40.00
6000-08-00  1   18  1   41.00   44.00   44.00   48.00   41.00   40.00
6000-09-00  1   9   2   58.00   54.00   45.00   47.00   69.00   56.00
6000-10-00  1   11  2   42.00   40.00   45.00   47.00   46.00   40.00
``````

So, I began with a simple code to figure out the frequency of what occurs in a variable based on some condition in this code:

condition 1:

``````Total_T <- sum(data\$Total_T[data\$Group==0]>=60, na.rm=TRUE)
``````

condition 1:

``````Total_T <- sum(data\$Total_T[data\$Group==0]<60, na.rm=TRUE)
``````

However, I need to repeat this code a bunch more times for different variables and different conditions (i.e. condition 1 would be repeated for 4 more variables as would condition 2 and so forth) and I would like to figure out how to make it more efficient.

So, I'm hoping to create a code that will return the frequency of Total_T, Neg_Mood_T etc based on the conditions I place on Group, Age and Gender.

I've tried to use
`data.frame(table())`
,
`ddply`
, but I'm honestly stumped.

Thanks !

We can use `subset` to get the part of the data we need, then `sum`:

``````x1 <- subset(data, Group== 0 & Gender == 1, select="Total_T")
sum(x1[x1 >= 60], na.rm=TRUE)
sum(x1[x1 < 60], na.rm=TRUE)

#Wrapped in a function
fun <- function(cols) {
x1 <- subset(data, Group== 0 & Gender == 1, select=cols)
sum(x1[x1 >= 60], na.rm=TRUE)
}

fun("Total_T")
[1] 176
fun("Neg_Mood_T")
[1] 191
``````

If you would like to get all the columns in one shot, you can use:

``````library(dplyr)
data %>% filter(Group == 0 & Gender == 1) %>%
summarise_at(-(1:4), funs(sum(.[. < 60])))
# Total_T Neg_Mood_T Interpersonal_Prob_T col7 col8 col9
# 1     176        191                  175  189  182  171
``````

Edit

There is a difference between summing the values of `Total_T` that fit the conditions and summing the number of times a value fits the description. We can show with an example:

``````x <- 1:10

#condition
x > 5

#1. sum values fitting the condition
sum(x[x > 5])
[1] 40

#2. sum number of times a value fits condition
sum(x > 5)
[1] 4
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download