giacomoV giacomoV - 11 months ago 58
R Question

R - ggplot2 parallel categorical plot

I am working with categorical longitudinal data. My data has 3 simple variables such as :

id variable value
1 1 1 c
2 1 2 b
3 1 3 c
4 1 4 c
5 1 5 c

is basically time, and
are the 3 possible categories one
can take.

I am interested in producing a "parallel" longitudinal graph, similar to this with

enter image description here

I am struggling a bit to get it right. What I came up for now is this :

dt0 %>% ggplot(aes(variable, value, group = id, colour = id)) +
geom_line(colour="grey70") +
geom_point(aes(colour=value, size = nn), size=4) +
scale_colour_brewer(palette="Set1") + theme_minimal()

enter image description here

The issue with this graph is that we can't really see the "thickness" of the "transition" (the

I wondered if you could help me for :

a) help make visible the
lines, or make it "thicker" according to the number of
going form one state to the other

b) I also would like to
the point according to the number of
in this state. I tried to do it with
geom_point(aes(colour=value, size = nn), size=4)
but it doesn't seem to work.


# data #


# generate random sequences #
dt = cbind(id = 1:1000, replicate(5, sample( c('a', 'b', 'c'), prob = c(0.1,0.2,0.7), 1000, replace = T)) ) )

# transform to PP file #
dt = dt %>% melt(id.vars = c('id'))

# create a vector 1-0 if the activity was performed #
dt0 = dt %>% group_by(id) %>% mutate(variable = 1:n()) %>% arrange(id)

# create the number of people in that state #
dt0 = dt0 %>% count(id, variable, value)
dt0 = dt0 %>% group_by(variable, value, n) %>% mutate(nn = n())

# to produce the first graph # 
otsplot(dt0$variable, factor(dt0$value), dt0$id)

Answer Source

you were so close with geom_point(aes(colour=value, size = nn), size=4), the problem was that with you redefined size after defining it in aes() ggplot overwrote the variable reference with the constant 4. Assuming you want to use nn to scale line thinkness as well, you could tweak your code to this:

dt0 %>% ggplot(aes(variable, value, group = id, colour = id)) +
    geom_line(colour="grey70", aes(size = nn)) +
    geom_point(aes(colour=value, size = nn)) + 
    scale_colour_brewer(palette="Set1") + theme_minimal()

If you wanted to use a lag value for the line thickness I would suggests adding that as a new column in dt0.

enter image description here