BWRT BWRT - 3 months ago 14
R Question

R ddply row summary statistics

For each row in my data frame below (defined by

FID_Bounda
,
NAME
,
DESCRIPTIO
&
SOVEREIGNT
) I'm trying to calculate the mean, standard deviation and the co-efficient of variation across all the values in each column starting with
crN
.

structure(list(FID_Bounda = 0:7, NAME = c("Bedfordshire", "Berkshire",
"Bristol", "Buckinghamshire", "Cambridgeshire", "Cheshire", "Derbyshire",
"Devon"), DESCRIPTIO = c("Ceremonial County", "Ceremonial County",
"Ceremonial County", "Ceremonial County", "Ceremonial County",
"Ceremonial County", "Ceremonial County", "Ceremonial County"
), SOVEREIGNT = c("England", "England", "England", "England",
"England", "England", "England", "England"), crN1 = c(61.944107636,
38.769347117, 0.810167027, 63.721241962, 191.046323469, 81.467146994,
61.65529268, 288.751788714), crN10 = c(60.33595964, 38.326639788,
0.834289164, 63.009539538, 185.25772542, 82.936101454, 61.985178493,
304.951827268), crN100 = c(53.385110882, 33.530058107, 0.739041324,
55.601839364, 165.604271128, 76.386014559, 55.591194915, 284.739586188
), crN1000 = c(58.397452282, 37.277298648, 0.820739862, 61.716749153,
175.436497697, 82.461823706, 61.762203751, 321.414544333)), .Names = c("FID_Bounda",
"NAME", "DESCRIPTIO", "SOVEREIGNT", "crN1", "crN10", "crN100",
"crN1000"), row.names = c(NA, 8L), class = "data.frame")


I tried to use the code outlined on cookbook-r to derive these values:

cdata <- ddply(uadt, c("FID_Bounda","NAME","DESCRIPTIO","SOVEREIGNT"), summarise,
N = length(grep("crN", names(uadt), value = T)),
mean = mean(grep("crN", names(uadt), value = F)),
sd = sd(grep("crN", names(uadt), value = F)),
se = sd / sqrt(N)
)
cdata


Which correctly counts the total N of
crN
columns but it gives the same mean, sd and se for every row. Any help as to where the issue lies would be much appreciated as the real data set has 1000 columns all with the same naming pattern of
crNnumber
.

Answer

I know this is not the perfect answer, but it might be worth using more up-to-date tools (equally I know the irony in this statement as my answer doesn't use tidyr). But the approach I would take is:

library(reshape2)
madt <- melt(uadt, 
             id.vars = c("FID_Bounda", "NAME", 
                         "DESCRIPTIO", "SOVEREIGNT"))
library(dplyr)
cdata <- summarise(group_by(madt,
                            FID_Bounda, NAME, 
                            DESCRIPTIO, SOVEREIGNT), 
                   N = n_distinct(variable), 
                   mean = mean(value), 
                   sd = sd(value), 
                   se = sd / sqrt(N))

This does produce the correct output