Daniel Daniel - 22 days ago 12
R Question

Statistics of 150 variables in data frame

I have tow data frames, one with 181 obs., and 521 variables, and the other data frame with two variables and 150 obs. The first data frame is the completely data, and the second is the continuos variables names from the first data frame. I want to calculate the mean, variance, median and other statistics for the continuos variables. For example:

df1:
ha_be me_cu par_pri fer_ex
1 1000 300 5
0 500 150 7
0 300 400 5
0 900 80 6
1 2100 50 3
1 3400 60 2
0 390 800 1
1 400 750 4

df_cont:
Cod variable.names
3.2 me_cu
3.3 par_pri


How can I extract all the continuos variables from
df1
using the names in
df_cont
and calculate all the basics statistics? I was trying with a
for
loop but doesn't work correctly.

Answer

We can use select to keep the columns in 'df1' and then with summarise_each get the basic statistics

library(dplyr)
df1 %>% 
     select_(.dots = df_cont$variable.names) %>% 
     summarise_each(funs(mean, sum)) #specify the functions

If 'variable.names' is factor, convert it to character (select_(.dots = as.character(df_cont$variable.names)))

In addition to passing functions individually in summarise_each, we can also call the summary to get the min, max, median, mean etc.

df1 %>% 
     select_(.dots = df_cont$variable.names) %>% 
     summary