chandler - 9 months ago 31

R Question

ggplot2 seems to mutating/transforming size variables.

Consider the following

`require(ggplot2); require(dplyr)`

set.seed(1234)

d <- data.frame(x = rnorm(100), y = rnorm(100), size = runif(100))

p.out <- ggplot(d, aes(x, y, size = size)) + geom_point()

p.data <- p.out %>% layer_data %>% arrange(x)

d2 <- d %>% arrange(x)

head(d2)

x y size

## 1 -2.345698 -0.50247778 0.7757949

## 2 -2.180040 -0.31611833 0.3802893

## 3 -1.806031 -0.37723765 0.2547007

## 4 -1.629093 -1.65010093 0.2722072

## 5 -1.448205 0.08005964 0.1999333

## 6 -1.390701 -1.12376279 0.5117742

p.data %>% select(size, x, y) %>% head

## size x y

## 1 5.407443 -2.345698 -0.50247778

## 2 4.084550 -2.180040 -0.31611833

## 3 3.523348 -1.806031 -0.37723765

## 4 3.608829 -1.629093 -1.65010093

## 5 3.234916 -1.448205 0.08005964

## 6 4.579018 -1.390701 -1.12376279

x and y seem to match the original data

`lm(y ~ x, p.data)`

## Call:

## lm(formula = y ~ x, data = p.data)

##

## Coefficients:

## (Intercept) x

## 0.03715 -0.02608

lm(y ~ x, d)

## Call:

## lm(formula = y ~ x, data = d)

##

## Coefficients:

## (Intercept) x

## 0.03715 -0.02608

But the size variable seems to mutated/transformed somehow

`cor(p.data$size, d2$size)`

## [1] 0.9783827

lm(y ~ x, data = d, weights = size)

## Call:

## lm(formula = y ~ x, data = d, weights = size)

##

## Coefficients:

## (Intercept) x

## -0.02586 -0.11537

lm(y ~ x, p.data, weights = size)

## Call:

## lm(formula = y ~ x, data = p.data, weights = size)

##

## Coefficients:

## (Intercept) x

## 0.009372 -0.065445

ggplot2 seems to be producing the correct plot when I use the original data, but I can't seem to reproduce the plot from

`layer_data()`

`ggplot_build()`

`size`

`p.data`

Answer Source

There is something interesting going on, perhaps someone more intimately familiar with ggplot2 can chip in. In the mean time, try calling the data from the ggplot object directly using `p.out$data`

.