Alwin - 1 year ago 81

R Question

I've created code that will take an input vector, create a dataframe based on the input, optimise some values and return some of these values. I'm now turning this into a function that will apply the calculations rowwise on an input dataframe. Below is a minimum working example of what I would like to achieve (my actual function would be too long to share here!):

`# Randomly generated dataframe`

df <- data.frame(a = rnorm(10, 0, 1), x = rnorm(10, 1, 3), y = rnorm(10, 2, 3))

# Function that takes multiple arguments and returns multiple values in a list

zsummary <- function(x, y) {

if (y < 0) return(list(NA, NA))

z = rnorm(10, x, abs(y))

return(list(mean(z), sd(z)))

}

# Example of something that works using dplyr

# However, this results in a lot of function calls...

# especially if there were a lot of columns in the list...

library(dplyr)

df %>% rowwise() %>%

mutate(mean = zsummary(x,y)[[1]], sd = zsummary(x,y)[[1]])

As you can see, I can't apply individual functions to each new

`df$mean`

`dfsd`

`z`

`apply`

`dplyr`

`apply`

`for`

`rbind`

Answer Source

We can use `mapply`

for this. As the `zsummary`

takes two arguments, the `mapply`

would be one option as it take corresponding element of 'x' and 'y' to apply the `zsummary`

.

```
t(mapply(zsummary, df$x, df$y))
```

We can also change the function slightly and get the output with `dplyr`

```
zsummary <- function(x, y) {
if (y < 0) return(data.frame(mean = NA, sd = NA))
z = rnorm(10, x, abs(y))
data.frame(mean = mean(z), sd = sd(z))
}
df %>%
rowwise() %>%
do(data.frame(., zsummary(.$x, .$y)))
```

Or as we discussed in the comments, instead of having the function taking multiple arguments, have a single argument and use `apply`

with `MARGIN=1`

for applying it on each row.

```
zsummary2 <- function(v1){
if(v1[2] < 0) return(c(mean = NA, sd = NA))
z <- rnorm(10, v1[1], abs(v1[2]))
c(mean = mean(v1), sd= sd(v1))
}
t(apply(df[-1], 1, zsummary2))
# mean sd
# [1,] 1.403066 0.8757504
# [2,] 5.058188 5.1401507
# [3,] 4.288365 1.4194393
# [4,] 1.932829 6.7587054
# [5,] -1.864236 3.7587462
# [6,] NA NA
# [7,] 3.328629 1.3711950
# [8,] -2.347699 5.0449958
# [9,] 2.936615 1.7332283
#[10,] NA NA
```

NOTE: The values will be different in each run as we didn't set any seed for the `rnorm`

.