I came across wonderful figure which summarizes (scientific) authors collaboration over years. The figure is pasted below.
Each vertical line refers to single author. The start of each vertical line correspond to the year the pertaining author received her first collaborator (i.e., when she became active and thus part of the collaboration network). Authors are ranked according to the total number of collaborators they have in the last year (i.e., in 2010). The coloring denotes how the number of collaborators of each author increased over the years (from the time of becoming active till 2010).
I have a similar dataset; instead of authors I have keywords in my dataset. Each numerical value denotes frequency of term in particular year. The data looks like:
Year Term1 Term2 Term3 Term4
1966 0 1 1 4
1967 1 5 0 0
1968 2 1 0 5
1969 5 0 0 2
The graph looking quite nice I tried to reproduce it. Turns out it's a bit more complicated than I thought.
df=read.table("test_data.txt",header=T,sep=",") #turn O into NA until >0 then keep values df2=data.frame(Year=df$Year,sapply(df[,!colnames(df)=="Year"],function(x) ifelse(cumsum(x)==0,NA,x))) #turn dataframe to a long format library(reshape) molten=melt(df2,id.vars = "Year") #Create a new value to measure the increase over time: I used a log scale to avoid a few classes overshadowing the others. #The increase is measured as the cumsum, ave() is used to get cumsum to work with NA's and tapply to group on "variable" molten$inc=log(Reduce(c,tapply(molten$value,molten$variable,function(x) ave(x,is.na(x),FUN=cumsum)))+1) #reordering of variable according to max increase #this dataframe is sorted in descending ordering according to the maximum increased library(dplyr) df_order=molten%>%group_by(variable)%>%summarise(max_inc=max(na.omit(inc)))%>%arrange(desc(max_inc)) #this allows to change the levels of variable so that variable are ranked in the plot molten$variable<-factor(molten$variable,levels=df_order$variable) #plot ggplot(molten)+ theme_void()+ #removes axes, background, etc... geom_line(aes(x=variable,y=Year,colour=inc),size=2)+ theme(axis.text.y = element_text())+ scale_color_gradientn(colours=c("red","green","blue"),na.value = "white")# set the colour gradient
Not as nice as in the paper, but that's a start.