giacomoV giacomoV - 1 month ago 11
R Question

R - ggplot2 parallel categorical plot

I am working with categorical longitudinal data. My data has 3 simple variables such as :

id variable value
1 1 1 c
2 1 2 b
3 1 3 c
4 1 4 c
5 1 5 c
...


Where
variable
is basically time, and
value
are the 3 possible categories one
id
can take.

I am interested in producing a "parallel" longitudinal graph, similar to this with
ggplot2


enter image description here

I am struggling a bit to get it right. What I came up for now is this :

dt0 %>% ggplot(aes(variable, value, group = id, colour = id)) +
geom_line(colour="grey70") +
geom_point(aes(colour=value, size = nn), size=4) +
scale_colour_brewer(palette="Set1") + theme_minimal()


enter image description here

The issue with this graph is that we can't really see the "thickness" of the "transition" (the
id
lines).

I wondered if you could help me for :

a) help make visible the
id
lines, or make it "thicker" according to the number of
id
going form one state to the other

b) I also would like to
re-size
the point according to the number of
id
in this state. I tried to do it with
geom_point(aes(colour=value, size = nn), size=4)
but it doesn't seem to work.

Thanks.

# data #
library(dplyr)
library(ggplot2)

set.seed(10)

# generate random sequences #
dt = as.data.frame( cbind(id = 1:1000, replicate(5, sample( c('a', 'b', 'c'), prob = c(0.1,0.2,0.7), 1000, replace = T)) ) )

# transform to PP file #
dt = dt %>% melt(id.vars = c('id'))

# create a vector 1-0 if the activity was performed #
dt0 = dt %>% group_by(id) %>% mutate(variable = 1:n()) %>% arrange(id)

# create the number of people in that state #
dt0 = dt0 %>% count(id, variable, value)
dt0 = dt0 %>% group_by(variable, value, n) %>% mutate(nn = n())

# to produce the first graph # 
library(vcrpart)
otsplot(dt0$variable, factor(dt0$value), dt0$id)

Answer

you were so close with geom_point(aes(colour=value, size = nn), size=4), the problem was that with you redefined size after defining it in aes() ggplot overwrote the variable reference with the constant 4. Assuming you want to use nn to scale line thinkness as well, you could tweak your code to this:

dt0 %>% ggplot(aes(variable, value, group = id, colour = id)) +
    geom_line(colour="grey70", aes(size = nn)) +
    geom_point(aes(colour=value, size = nn)) + 
    scale_colour_brewer(palette="Set1") + theme_minimal()

If you wanted to use a lag value for the line thickness I would suggests adding that as a new column in dt0.

enter image description here

Comments