Misha Misha - 7 days ago 3
R Question

Visualizing sequence of states according to an incidence date using TraMineR

I am trying to generate a plot relating the

ne2
sequence of states as it relates to an incidence date in
ne3
(data below). I have data spanning a 11 year period from 2004-2015. The incidence date (
ne3$date_inc
) is also within this 11 year period, but these incidence dates are not equal for the different id´s. I´d like to have incidence date as the reference, so that the distribution of states before and after this incidence date for each id can be visualized using
seqdplot
where the x axis then has a mutual reference according to the incidence date (ie months before and after incidence date). However, referencing the state dates according to the incidence date as zero results in negative values for the states occurring before the incidence. Any idea if this can be done using
TraMineR
? Or other suggestions?

library(TraMineR)
ne2 <- structure(list(id = c(4885109L, 4885109L, 4885109L, 7673891L,
11453161L, 13785017L, 13785017L, 16400365L), status = structure(c(4L,
2L, 3L, 4L, 4L, 1L, 5L, 4L), .Label = c("A", "B", "C", "D", "E"
), class = "factor"), date_start = structure(c(12432, 15262,
15385, 12432, 12432, 12432, 14318, 12432), class = "Date"), date_end = structure(c(15262,
15385, 16450, 16450, 16450, 14318, 16450, 16450), class = "Date")), class = "data.frame", .Names = c("id",
"status", "date_start", "date_end"), row.names = c(NA, -8L))

ne3 <- structure(list(id = c(4885109L, 7673891L, 11453161L, 13785017L,
16400365L), date_inc = structure(c(15170, 13406, 13528, 13559,
15598), class = "Date")), .Names = c("id", "date_inc"), class = "data.frame", row.names = c(NA,
-5L))

Answer

Here is how you can make the sequences align on their incidence date.

We start by transforming your SPELL data into the STS format used by TraMineR. Since sequences are longer than 100, we have to specify the max number of columns (limit) of the table that will store the sequences . So we first compute the max length of the sequences

limit <- max(ne2$date_end) - min(ne2$date_start)

Now we transform the SPELL data into the STS form

ne2.sts <- seqformat(ne2, id='id', begin='date_start', end='date_end', status='status',
                     from='SPELL', to='STS', limit=as.numeric(limit), process=FALSE)

dim(ne2.sts)
## [1]    5 4019

Note that since the start and end dates are provided in data format, a daily time granularity is used. As a consequence we get very long sequences of 4019 days.

Now, we need to shift the sequences to align their incidence date. This can be done with the seqstart function of TraMineRextras.

The shift is the difference between the incidence date and its minimum. So we set the new start date as

ne3$bd <- ne3$date_inc - min(ne3$date_inc) + min(ne2$date_start)

We load TraMineRextras to gain access to seqstart

library(TraMineRextras)

We shift the sequences, create the state sequence object and plot it with seqdplot

ne2.sts.a <- seqstart(ne2.sts, data.start=min(ne2$date_start), new.start=ne3$bd)
ne2.a.seq <- seqdef(ne2.sts.a)
seqdplot(ne2.a.seq, border=NA)

enter image description here

Note that due to the length of the sequences, it takes a few minutes to generate the plot. I would suggest using monthly data instead of daily data.