T-T - 1 year ago 112
R Question

R Conditional standard deviation

I have a large data set and I need to get the standard deviation for the

`Main`
column based on the number of rows in other columns. Here is a sample data set:

``````df1 <- data.frame(
Main = c(0.33, 0.57, 0.60, 0.51),
B = c(NA, NA, 0.09,0.19),
C = c(NA, 0.05, 0.07, 0.05),
D = c(0.23, 0.26, 0.23, 0.26)
)

View(df1)
#   Main    B       C       D
# 1 0.33    NA      NA      0.23
# 2 0.57    NA      0.05    0.26
# 3 0.60    0.09    0.07    0.23
# 4 0.51    0.19    0.05    0.26
``````

Take column
`B`
as an example, since row 1&2 are
`NA`
, its standard deviation will be
`sd(df1[3:4,1])`
; column
`C&D`
will be
`sd(df1[2:4,1])`
and
`sd(df1[1:4,1])`
. Therefore, the result will be:

``````#     B       C       D
# 1   0.06    0.05    0.12
``````

I did the followings but it only returned one number -
`0.0636`

``````df2 <- df1[,-1]!=0

sd(df1[df2,1], na.rm = T)
``````

My data set has many more columns, and I'm wondering if there is a more efficient way to get it done? Many thanks!

``````sapply(df1[,-1], function(x) sd(df1[!is.na(x), 1]))