markus müller markus müller - 5 days ago 5
R Question

How to compute standard deviation leaving out rows identified by levels of two other vectors in R?

I compute standard deviation for firms in the same sector and same region with the aggregate argument

sd_lsales <- aggregate(lsales, by=list(region, sector), function(x) sd (x, na.rm=TRUE))


for a dataframe like:

region sector lsales
1 Vienna Food 363000000
2 Vienna Other Manufacturing 5930000
3 Vienna Food 150000000
...
505 Berlin Food 39200000
506 Berlin Manufacturing 203900000
507 Berlin Manufacturing 298000000
508 Berlin IT 30339200

The whole data.frame consists of
852 Regions
52 Sectors
28052 observations


Is there any way to compute standard deviations for all other industries in the same region except for the sector mentioned in the same row?

For example in Row 1 I want to compute the standard deviation of lsales (=”log sales”) for all sectors in Vienna except for sector “Food”.
In Row 506 I want standard deviation of lsales for all sectors in Berlin except for sector “Manufacturing” etc.

As I have to do this for 852 Regions, 52 sectors I try to figure out if it is possible to achieve this with something like a “loop” or “if else” ... argument?

Answer

We can try

df1$sd_lsales <- unsplit(lapply(split(df1[-1], df1$Region), function(x) {
                  sd1 <- sapply(unique(x$Sector), function(y) {
                  i1 <- x$Sector!=y
                  sd(x$lsales[i1])

                    })
                 sd1[x$Sector]
               }

             ), df1$Region)

data

df1 <- structure(list(Region = c("Vienna", "Vienna", "Vienna", "Berlin", 
"Berlin", "Berlin", "Berlin"), Sector = c("Food", "Other Manufacturing", 
 "Food", "Food", "Manufacturing", "Manufacturing", "IT"), lsales = c(363000000L, 
5930000L, 150000000L, 39200000L, 203900000L, 298000000L, 30339200L
 )), .Names = c("Region", "Sector", "lsales"), class = "data.frame", row.names = c("1", 
"2", "3", "505", "506", "507", "508"))
Comments