Matherion Matherion - 1 year ago 87
R Question

How can I transform aesthetics 'on the fly' in ggplot using variables inside or outside the relevant dataframe?

In psychology, it's common to display histograms with an overlaying normal curve. Also showing the density of the observed values with geom_line would facilitate comparison to the normal curve, so I wrote another histogram function that does this (

in the
package). However, it performs very slowly for large vectors (currently working with 16.7 million datapoints), so I'm trying to make it faster. I used to use
to manually compute the density estimates, and then multiply them with maximum number of datapoints in a bin to scale it to match the histogram.

But this is very slow, plus, I figured ggplot2 should be able to do this. One of the variables computed by
, which is the density estimate scaled to a max of 1. Now I just need to multiply this. But ggplot2 won't find the variable I use. Multiplying it with a constant works fine, but whether I place the variable in the dataframe I pass on to ggplot2 or not doesn't seem to matter: ggplot2 can't find it.

scalingFactor <- max(table(cut(mtcars$mpg, breaks=20)));
dat <- data.frame(mpg = mtcars$mpg,
scalingFactor = scalingFactor);
ggplot(mtcars, aes(x=mpg)) +
geom_histogram(bins=20) +
geom_line(aes(y=..scaled.. * scalingFactor),
stat='density', color='red');

This yields:

Error in eval(expr, envir, enclos) : object 'scalingFactor' not found

When replacing the
with a regular number, it works:

ggplot(mtcars, aes(x=mpg)) +
geom_histogram(bins=20) +
geom_line(aes(y=..scaled.. * 10),
stat='density', color='red');

Histogram with hardcoded scaled densitycurve

Also, when just using
on its own, it also works:

ggplot(mtcars, aes(x=mpg)) +
geom_histogram(bins=20) +
geom_line(aes(y=scalingFactor ),
stat='density', color='red');

Histogram with horizontal line showing scalingFactor

seems available; multiplication is available; and clearly
is available. Still, combining them seems to fail. What am I missing here? I can't find anything on 'computation with variables generated by stat' or something . . .

Has anybody run into this before? Is it known ggplot2 behavior that I just missed?

Answer Source

try with aes_q(y=bquote(..scaled.. * .(scalingFactor)))

(although I would think there is a bug somewhere, since the environment argument in ?ggplot suggests this shouldn't be needed, and in fact isn't needed when dealing with variables that don't come from a stat)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download