E B E B - 1 month ago 6
R Question

R Using cut function on dates defined as Number and format of the breaks

I have a dataframe that has Dates and Runtimes

DF = data.frame(Year = c(1800,1892,1910,2000,2004),Runtimes=c(80,10,15,10,30))

Year Runtimes
1 1800 80
2 1892 10
3 1910 15
4 2000 10
5 2004 30

I am using CUT to create breaks by 10 based on the range of year I have . And then plotting this frequency distribution in ggplot. What I notice is that when I did the CUT, the values of year since it is defined as a NUM got represented in a NUMBER form and not like a 4-CHAR Year.

Is there a way to preserve the yr in a more readable format like [1890,1900) instead of the number format so that the information is more readable?

Here is the code that I have been playing with:

yr_bins = seq(1800,2010,10)
rt_yr = cut(yr,breaks=yr_bins,right=FALSE)
yr_freq_table = transform(table(rt_yr))
ggplot(yr_freq_table) +
geom_bar(aes(x=rt_yr,y=Freq), fill="lightblue",color="lightslategray",
position="stack",stat="identity",ylab("Count Year (mins)") +
scale_x_discrete(drop=F) + theme(axis.text.x=element_text(angle=90,
vjust=.5, hjust=1)) + ggtitle("Runtime Distribution")

Sample data is below:

rt_yr Freq

1 [1.8e+03,1.81e+03) 1
2 [1.81e+03,1.82e+03) 0
3 [1.82e+03,1.83e+03) 0

UPDATE: The issue that I am tring to solve is to be able to represent the information in ggplot with the rt_yr not being numeric but in ranges of 10


You can use the dig.lab argument in the cut function to prevent scientific notation. For example:

rt_yr = cut(DF$Year, breaks=yr_bins, right=FALSE, dig.lab=4)

ggplot(yr_freq_table) + 
  geom_bar(aes(x=rt_yr, y=Freq), fill="lightblue", color="lightslategray", 
           stat="identity") +
  labs(y="Count Year (mins)") + 
  scale_x_discrete(drop=F) + 
  theme(axis.text.x=element_text(angle=90, vjust=.5, hjust=1)) + 
  ggtitle("Runtime Distribution")

enter image description here

If you want the labels formatted a specific way, you can also set the labels yourself using the labels argument. For example, let's say we prefer a hyphen separator instead of a comma:

rt_yr = cut(DF$Year,breaks=yr_bins, 
        labels=paste0("[", yr_bins[-length(yr_bins)], "-", yr_bins[-1], ")"),

enter image description here