Aaron - 10 months ago 65

R Question

I would like to make a groupwise

`summarise()`

I have count data that looks like this. The concentration and the standard deviation are calculated like this:

`library(dplyr)`

testdata <- data_frame(sample = sort(rep(1:3, 4)),

volume = rep(c(1e-1, 1e-1, 1e-2, 1e-2), 3),

count = c(400, 400, 40, 40, 0, 0, 0, 0, 400, 400, 400, 400))

testdata %>%

group_by(sample) %>%

summarise(concentration = sum(count) / sum(volume),

sd = sqrt(sum(count)))

However, when making the calculation only counts with values between 25-250 are to be included. which I could achieve with:

`testdata %>%`

group_by(sample) %>%

filter((count >= 25) & (count <= 250)) %>%

summarise(concentration = sum(count) / sum(volume),

sd = sqrt(sum(count)))

But then samples 2 & 3 have no concentration.

The edge cases for each group might be calculated with something like:

`if (all(count <= 25)){`

summarise(concentration = 25 / min(volume),

sd = NA)

}

else if (all(count >= 250)){

summarise(concentration = 250 / max(volume),

sd = NA)

}

Can such edge cases be integrated into the

`summarise()`

I would ideally also like a flag to indicate an edge case which returns result = "OK" for all cases except edge cases that return:

`if (all(count <= 25)){`

summarise(concentration = 25 / min(volume),

sd = NA,

result = "LOW")

}

else if (all(count >= 250)){

summarise(concentration = 250 / max(volume),

sd = NA,

result = "HIGH")

}

Answer Source

One way is to encode your logic within `summarise`

using `ifelse`

:

```
library(dplyr)
result <- testdata %>% group_by(sample) %>%
summarise(concentration = ifelse(all(count <= 25),
25 / min(volume),
ifelse(all(count >= 250),
250 / max(volume),
sum(count) / sum(volume))),
sd = ifelse(all(count <= 25),
NA,
ifelse(all(count >= 250),
NA,
sqrt(sum(count)))),
result = ifelse(all(count <= 25),
"LOW",
ifelse(all(count >= 250),
"HIGH",
"OK")))
print(result)
### A tibble: 3 x 4
## sample concentration sd result
## <int> <dbl> <dbl> <chr>
##1 1 4000 29.66479 OK
##2 2 2500 NA LOW
##3 3 2500 NA HIGH
```

Another approach, which is hopefully closer to what the OP asks, is to define a function:

```
summarise.func <- function(count, volume) {
if (all(count <= 25)) {
concentration <- 25 / min(volume)
sd <- NA
result <- "LOW"
} else if (all(count >= 250)) {
concentration <- 250 / max(volume)
sd <- NA
result <- "HIGH"
} else {
concentration <- sum(count) / sum(volume)
sd <- sqrt(sum(count))
result <- "OK"
}
data.frame(concentration=concentration, sd=sd, result=result, stringsAsFactors=FALSE)
}
```

that handles both the regular case and the edge cases. The key is that this function return a `data.frame`

containing the summarized results. Then, `summarise`

will create a column that is a list containing these data frames that can then be `tidyr::unnest`

ed:

```
library(dplyr)
library(tidyr)
result <- testdata %>% group_by(sample) %>%
summarise(csr=list(f(count, volume))) %>%
unnest(csr)
print(result)
### A tibble: 3 x 4
## sample concentration sd result
## <int> <dbl> <dbl> <chr>
##1 1 4000 29.66479 OK
##2 2 2500 NA LOW
##3 3 2500 NA HIGH
```