user3702510 - 1 year ago 103

R Question

I have a dataset with numeric values and a categorical variable. The distribution of the numeric variable differs for each category. I want to plot "density plots" for each categorical variable so that they are visually below the entire density plot.

This is similiar to components of a mixture model without calculating the mixture model (as I already know the categorical variable which splits the data).

If I take ggplot to group according to the categorical variable, each of the four densities are real densities and integrate to one.

`library(ggplot2)`

ggplot(iris, aes(x = Sepal.Width)) + geom_density() + geom_density(aes(x = Sepal.Width, group = Species, colour = 'Species'))

What I want is to have the densities of each category as a sub-density (not integrating to 1). Similiar to the following code (which I only implemented for two of the three iris species)

`myIris <- as.data.table(iris)`

# calculate density for entire dataset

dens_entire <- density(myIris[, Sepal.Width], cut = 0)

dens_e <- data.table(x = dens_entire[[1]], y = dens_entire[[2]])

# calculate density for dataset with setosa

dens_setosa <- density(myIris[Species == 'setosa', Sepal.Width], cut = 0)

dens_sa <- data.table(x = dens_setosa[[1]], y = dens_setosa[[2]])

# calculate density for dataset with versicolor

dens_versicolor <- density(myIris[Species == 'versicolor', Sepal.Width], cut = 0)

dens_v <- data.table(x = dens_versicolor[[1]], y = dens_versicolor[[2]])

# plot densities as mixture model

ggplot(dens_e, aes(x=x, y=y)) + geom_line() + geom_line(data = dens_sa, aes(x = x, y = y/2.5, colour = 'setosa')) +

geom_line(data = dens_v, aes(x = x, y = y/1.65, colour = 'versicolor'))

resulting in

Above I hard-coded the number to reduce the y values. Is there any way to do it with ggplot? Or to calculate it?

Thanks for your ideas.

Answer Source

Do you mean something like this? You need to change the scale though.

```
ggplot(iris, aes(x = Sepal.Width)) +
geom_density(aes(y = ..count..)) +
geom_density(aes(x = Sepal.Width, y = ..count..,
group = Species, colour = Species))
```

Another option may be

```
ggplot(iris, aes(x = Sepal.Width)) +
geom_density(aes(y = ..density..)) +
geom_density(aes(x = Sepal.Width, y = ..density../3,
group = Species, colour = Species))
```