MichaelChirico MichaelChirico - 3 months ago 13
R Question

Plotting by group in data.table

I've got individual-level data for which I'm trying to summarize an outcome dynamically by group.

Example:

set.seed(12039)
DT <- data.table(id = rep(1:100, each = 50),
grp = rep(letters[1:4], each = 1250),
time = rep(1:50, 100),
outcome = rnorm(5000))


I want to know the simplest way to plot the group-level summary, the data for which is contained in:

DT[ , mean(outcome), by = .(grp, time)]


I wanted something like:

dt[ , plot(mean(outcome)), by = .(grp, time)]


But this doesn't work at all.

The workable option I am surviving on (which could be looped pretty easily) is:

plot(DT[grp == "a", mean(outcome), by = time])
lines(DT[grp == "b", mean(outcome), by = time])
lines(DT[grp == "c", mean(outcome), by = time])
lines(DT[grp == "d", mean(outcome), by = time])


(with added parameters for colors, etc, excluded for conciseness)

This strikes me as not the best way to do this--given
data.table
's craft in handling groups, is there not a more elegant solution?

Other sources have been pointing me to
matplot
but I can't see a straightforward way to use it--do I need to reshape
DT
, and is there a simple
reshape
that would get the job done?

Answer

Base R solution using matplot and dcast

dt_agg <- dt[ , .(mean = mean(outcome)), by=.(grp,time)]
dt_cast <- dcast(dt_agg, time~grp, value.var="mean")
dt_cast[ , matplot(time, .SD[ , !"time", with=FALSE],
                   type="l", ylab="mean", xlab="")]

Result: enter image description here