chandler chandler - 1 month ago 8
R Question

ggplot2 is mutating/transforming size variables -- how to get back original data?

ggplot2 seems to mutating/transforming size variables.

Consider the following

require(ggplot2); require(dplyr)
set.seed(1234)
d <- data.frame(x = rnorm(100), y = rnorm(100), size = runif(100))
p.out <- ggplot(d, aes(x, y, size = size)) + geom_point()
p.data <- p.out %>% layer_data %>% arrange(x)
d2 <- d %>% arrange(x)
head(d2)
x y size
## 1 -2.345698 -0.50247778 0.7757949
## 2 -2.180040 -0.31611833 0.3802893
## 3 -1.806031 -0.37723765 0.2547007
## 4 -1.629093 -1.65010093 0.2722072
## 5 -1.448205 0.08005964 0.1999333
## 6 -1.390701 -1.12376279 0.5117742

p.data %>% select(size, x, y) %>% head

## size x y
## 1 5.407443 -2.345698 -0.50247778
## 2 4.084550 -2.180040 -0.31611833
## 3 3.523348 -1.806031 -0.37723765
## 4 3.608829 -1.629093 -1.65010093
## 5 3.234916 -1.448205 0.08005964
## 6 4.579018 -1.390701 -1.12376279


x and y seem to match the original data

lm(y ~ x, p.data)


## Call:
## lm(formula = y ~ x, data = p.data)
##
## Coefficients:
## (Intercept) x
## 0.03715 -0.02608

lm(y ~ x, d)

## Call:
## lm(formula = y ~ x, data = d)
##
## Coefficients:
## (Intercept) x
## 0.03715 -0.02608


But the size variable seems to mutated/transformed somehow

cor(p.data$size, d2$size)
## [1] 0.9783827

lm(y ~ x, data = d, weights = size)

## Call:
## lm(formula = y ~ x, data = d, weights = size)
##
## Coefficients:
## (Intercept) x
## -0.02586 -0.11537

lm(y ~ x, p.data, weights = size)

## Call:
## lm(formula = y ~ x, data = p.data, weights = size)
##
## Coefficients:
## (Intercept) x
## 0.009372 -0.065445


ggplot2 seems to be producing the correct plot when I use the original data, but I can't seem to reproduce the plot from
layer_data()
or from
ggplot_build()
. How can I transform the
size
variable in
p.data
to get back the original size variable?

Answer

There is something interesting going on, perhaps someone more intimately familiar with ggplot2 can chip in. In the mean time, try calling the data from the ggplot object directly using p.out$data.