Matherion - 1 year ago 87
R Question

# How can I transform aesthetics 'on the fly' in ggplot using variables inside or outside the relevant dataframe?

In psychology, it's common to display histograms with an overlaying normal curve. Also showing the density of the observed values with geom_line would facilitate comparison to the normal curve, so I wrote another histogram function that does this (

`powerHist`
in the
`userfriendlyscience`
package). However, it performs very slowly for large vectors (currently working with 16.7 million datapoints), so I'm trying to make it faster. I used to use
`density`
to manually compute the density estimates, and then multiply them with maximum number of datapoints in a bin to scale it to match the histogram.

But this is very slow, plus, I figured ggplot2 should be able to do this. One of the variables computed by
`stat_density`
is
`..scaled..`
, which is the density estimate scaled to a max of 1. Now I just need to multiply this. But ggplot2 won't find the variable I use. Multiplying it with a constant works fine, but whether I place the variable in the dataframe I pass on to ggplot2 or not doesn't seem to matter: ggplot2 can't find it.

``````scalingFactor <- max(table(cut(mtcars\$mpg, breaks=20)));
dat <- data.frame(mpg = mtcars\$mpg,
scalingFactor = scalingFactor);
ggplot(mtcars, aes(x=mpg)) +
geom_histogram(bins=20) +
geom_line(aes(y=..scaled.. * scalingFactor),
stat='density', color='red');
``````

This yields:

``````Error in eval(expr, envir, enclos) : object 'scalingFactor' not found
``````

When replacing the
`scalingFactor`
with a regular number, it works:

``````ggplot(mtcars, aes(x=mpg)) +
geom_histogram(bins=20) +
geom_line(aes(y=..scaled.. * 10),
stat='density', color='red');
``````

Also, when just using
`scalingFactor`
on its own, it also works:

``````ggplot(mtcars, aes(x=mpg)) +
geom_histogram(bins=20) +
geom_line(aes(y=scalingFactor ),
stat='density', color='red');
``````

So
`scalingFactor`
seems available; multiplication is available; and clearly
`..scaled..`
is available. Still, combining them seems to fail. What am I missing here? I can't find anything on 'computation with variables generated by stat' or something . . .

Has anybody run into this before? Is it known ggplot2 behavior that I just missed?

try with `aes_q(y=bquote(..scaled.. * .(scalingFactor)))`