EA00 - 1 year ago 97
R Question

# R - difference scatter plot

I was wondering if there is a way to subtract two binned scatter plots from one another in R. I have two distributions with the same axes and want to overlay one on top of the other and subtract them hence producing a difference scatter plot.

Here are my two plots:

and my script for the plots:

``````library(hexbin)
library(RColorBrewer)

setwd("/Users/home/")
df <- read.table("data1.txt")
x <-df\$c2
y <-df\$c3

bin <-hexbin(x,y,xbins=2000)
my_colors=colorRampPalette(rev(brewer.pal(11,'Spectral')))
d <- plot(bin, main=""  , colramp=my_colors, legend=F)
``````

Any advice on how to go about this would be very helpful.

Answer Source

Alright, as a starting point, here is some sample data. Each is random, with one shifted to (2,2).

``````df1  <-
data.frame(
x = rnorm(1000)
, y = rnorm(1000)
)

df2  <-
data.frame(
x = rnorm(1000, 2)
, y = rnorm(1000, 2)
)
``````

To ensure that the bins are identical, it is best to construct one `hexbin` object. To accomplish this, I am using `dplyr`'s `bind_rows` to keep a track of which data.frame the data came from (this would be even easier if you had a single data.frame with a grouping variable).

``````bothDF <-
bind_rows(A = df1, B = df2, .id = "df")

bothHex <-
hexbin(x = bothDF\$x
, y = bothDF\$y
, IDs = TRUE
)
``````

Next, we are using a mix of `hexbin` and `dplyr` to count the occurrences of each within each cell. First, apply across the bins, constructing a table (needs to use `factor` to make sure all levels are shown; not needed if your column is already a factor). Then, it simplifies it and constructs a data.frame that is then manipluated with `mutate` to calculate the difference in counts and then joined back to a table that gives the x and y values for each of the id's.

``````counts <-
hexTapply(bothHex, factor(bothDF\$df), table) %>%
simplify2array %>%
t %>%
data.frame() %>%
mutate(id = as.numeric(row.names(.))
, diff = A - B) %>%
left_join(data.frame(id = bothHex@cell, hcell2xy(bothHex)))
``````

`head(counts)` gives:

``````  A B  id diff          x         y
1 1 0   7    1 -1.3794467 -3.687014
2 1 0  71    1 -0.8149939 -3.178209
3 1 0  79    1  1.4428172 -3.178209
4 1 0  99    1 -1.5205599 -2.923806
5 2 0 105    2  0.1727985 -2.923806
6 1 0 107    1  0.7372513 -2.923806
``````

Finally, we use `ggplot2` to plot the resulting data, as it offers more control (and the ability to more easily use a different variable than count as fills) than `hexbin` itself.

``````counts %>%
ggplot(aes(x = x, y = y
, fill = diff)) +
geom_hex(stat = "identity") +
coord_equal() +
scale_fill_gradient2()
``````

From there, it is easy to play around with axes, colors, etc.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download