Ben - 1 year ago 70

R Question

I have several sets of data stored in a data frame. For the sake of this question, I provide below a way to generate this data frame, but IRL, I only have the

`merged`

`x <- seq.POSIXt(from = strptime("1970-01-01 00:00:00", format = "%Y-%m-%d %H:%M:%S"),`

to = strptime("1970-01-01 00:05:00", format = "%Y-%m-%d %H:%M:%S"),

by = "10 sec")

x <- rep(x, each = 3)

y <- c()

set.seed(1)

for (i in 1:length(x)) {

y <- c(y, runif(1, min = 0, max = i))

}

my.data.frame1 <- data.frame(x, y, data = as.factor("data1"))

y <- c()

for (i in 1:length(x)) {

y <- c(y, runif(1, min = length(x) - i, max = length(x)))

}

my.data.frame2 <- data.frame(x, y, data = as.factor("data2"))

merged <- rbind(my.data.frame1, my.data.frame2)

ggplot(merged, aes(x, y, color = data)) + geom_point() + geom_line()

So for each type of data (data1 and data2), and for each date value on the x axis, I have 3 y values.

The plot looks (bad) like this:

What I want to do is to plot a

`geom_ribbon`

I first tried to extract the min and max values with an

`aggregate`

Can anyone help?

The code I tried with

`aggregate`

`aggregate(y ~ x, data = merged, max)`

(Same for the min). But this does not make the difference between the data1 set and the data2 set. I know I could subset, but I guess it can be done using the "by" argument. Just couldn't make it work.

Answer Source

You were on the right track, and need to aggregate by both `data`

and `x`

instead of just `x`

.

You can either calculate the `min`

and `max`

by group separately in two `aggregate`

calls and then merge or do both at the same time. For the second approach you'll need an additional step to get the output of the two functions into separate columns.

```
my.new.df = aggregate(y ~ data + x, data = merged, FUN = function(x) c(min = min(x), max = max(x)))
# Get the min and max as separate columns
my.new.df = as.data.frame(as.list(my.new.df))
ggplot(my.new.df, aes(x, fill = data)) +
geom_ribbon(aes(ymin = y.min, ymax = y.max), alpha = 0.6)
```

You can also make the plot directly using `stat = "summary"`

in `geom_ribbon`

instead of making an aggregate dataset for plotting.

```
ggplot(merged, aes(x, y, fill = data)) +
geom_ribbon(alpha = 0.6, stat = "summary", fun.ymax = max, fun.ymin = min)
```