MichaelChirico - 1 year ago 82
R Question

# Strange error plotting by group

Sorry for the massive data dump but I can't reproduce this on the subsets of the data I've tried. Copy-pasted the

`dput`
of the data (165 obs., not crazy) to this Gist.

I'm trying to plot the data in
`DT`
by
`sport`
, according to:

1. Create empty plot with proper limits to accommodate all data

2. Plot the column
`gini`
as a scatterplot, with colors varying by
`sport`

3. Plot the column
`five_year_ma`
as a line, with color matching that in 2.

This should be simple and I've done things like it before. Here's what should work:

``````#empty plot with proper axes
DT[ , plot(
NA, ylim = range(gini), xlim = range(season),
xlab = "Season", ylab = "Gini",
main = "Comparison of Gini Coefficient Across Sports")]

#pick colors for each sport
cols <- c(NHL="black", NBA="red")

DT[ , {
points(season, gini, col = cols[.BY\$sport])

lines(season, five_yr_ma, col = cols[.BY\$sport], lwd = 3)},
by = sport]
``````

But this gives me output/error:

``````# Empty data.table (0 rows) of 1 col: sport
``````

Error:
`x`
and
`y`
lengths differ in
`plot.xy()`

This is strange. If we skip the grouping and just do it manually, it works perfectly fine:

``````all_sports[sport == "NBA", {
points(season, gini, col = "red")
lines(season, five_yr_ma, col = "red", lwd = 3)}]

all_sports[sport == "NHL", {
points(season, gini, col = "black")
lines(season, five_yr_ma, col = "black", lwd = 3)}]
``````

Moreover, even in the context of grouping, it's unclear why
`plot.xy`
has received arguments of different length -- if we make the following adjustment to force R to record the inputs just before they're sent, there doesn't appear to be any issue:

``````all_sports[ , {
cat("\n\nPlotting for sport: ", .BY\$sport)
points(x1 <- season, y1 <- gini, col = cols[.BY\$sport])
lines(x2 <- season, y2 <- five_yr_ma, col = cols[.BY\$sport], lwd = 3)
cat("\npoints/season: ",length(x1),
"\npoints/gini: ", length(y1),
"\nlines/season: ", length(x2),
"\nlines/five_yr_ma: ", length(y2))},
by = sport]
``````

Has output:

``````# Plotting for sport:  NHL
# points/season:  98
# points/gini:  98
# lines/season:  98
# lines/five_yr_ma:  98

# Plotting for sport:  NBA
# points/season:  67
# points/gini:  67
# lines/season:  67
# lines/five_yr_ma:  67
``````

What could be going on??

Since it appears like this is not common across machines, here's my
`sessionInfo()`
:

``````R version 3.2.4 (2016-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] data.table_1.9.7

loaded via a namespace (and not attached):
[1] rsconnect_0.4.1.11 tools_3.2.4
``````

Indeed, as @Arun points out, it seems this is a resurfacing of the (as yet unsolved) issue which was causing the error in this question:

Values of the wrong group are used when using plot() within a data.table() in RStudio

As @Arun discovered there, it seems like RStudio's native graphics device is somehow getting tripped up by the changing pointers used for the different subgroups created when evaluating `j` when `by` is present, which lends itself to the workaround of simply `copy`ing all of `.SD` each time, like:

``````points(copy(season), copy(gini),
col = cols[.BY\$sport])
lines(copy(season), copy(five_yr_ma),
col = cols[.BY\$sport], lwd = 3)
``````

Or

``````x <- copy(.SD)
with(x, {points(season, gini, cols = cols[.BY\$sport]);
lines(copy(season), copy(five_yr_ma),
col = cols[.BY\$sport], lwd = 3)})
``````

Both of which worked for me (since the subgroups are so small, there's no computational efficiency concern at play here -- we can `copy` away without affecting performance noticeably).

This is #1524 at the `data.table` GitHub page and I've filed this bug report at RStudio Support; will update this if a fix is pushed.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download