Evechinus Evechinus - 1 month ago 6
R Question

R: Function to ID duplicated dates and jitter months to get sequential months within a year

Long time lurker, first time poster and still a bumbling R beginner.

I would like some help jittering months in R – Jitter may not be the best description??

The full data set i am working with consists of 10,000 rows x 30 columns. The data set contains 40 sites with start dates for each site ranging from 1986 to 2012, with monthly samples collected (at each site) up to Dec 2015. There are missing dates (samples) but these are not represented in the dataset. Therefore for any given site, there may or may not be 12 months (samples) per year.

Below is an example data set and the desired dates i am after would look like the req.date data frame which have sequential months

wq <- data.frame(site = c(rep("A", 5), rep("B", 5)),
date = as.Date(c("23/06/2012", "01/07/2012", "26/07/2012",
"05/09/2012", "23/10/2012", "01/04/2016", "08/05/2016",
"01/07/2016", "30/07/2016", "05/08/2016"), format = "%d/%m/%Y"),
year = c(rep("2012", 5), rep("2016", 5)),
month = c(6, 7, 7, 9, 10, 4, 5, 7 , 7, 8))


req.date <- data.frame(req.date =
as.Date(c("23/06/2012", "01/07/2012", "26/08/2012",
"05/09/2012", "23/10/2012", "01/04/2016", "08/05/2016",
"01/06/2016", "30/07/2016", "05/08/2016"), format = "%d/%m/%Y"))


I created the month and year columns so people could understand my question and are not necessary for my final data set.

What i would like to know is how to “jitter” the month part of wq$date (by +/- one month) where a month is duplicated. I am only interested in the adjusting months and I am not so concerned about the exact day.

I found this add.Month function (Add a month to a Date), but would appreciate help with a function to adjust wq$date accounting for the month in the rows above and below the duplicated date

I have ID-ed the duplicated date(s) grouped by site and year

wq$dup <- duplicated(wq[ ,c(1,3,4)])


But now I am are unsure how to proceed with a function to do the last step.
I will use my really poor R coding skills to attempt a solution (and I apologies for my lack of skill here!)

#use wq$month to make it easier to make comparisons
wq$new.date <- ifelse wq$dup ="TRUE", c <- wq$month - (nrow(wq$month) -2)
ifelse c = 1, wq$date <- month(wq$date) + 1,
# if the diff btw the duplicate date/month is 1 month more than the month value located 2 rows up, then the
# duplicate month needs +1 month
ifelse c = 2, (nrow(wq$date) +1) <- month(wq$date) - 1
# if the diff btw the duplicate date/month is 2 months more than the month value located 2 rows up, then the
# month above the duplicate month needs -1 month
else wq$date


Any help would be greatly appreciated!

Updated:

I need to ID the duplicate months (which i have done) then look at the sequence of months within the year to determine if the duplicate month needs to be adjusted (by +/- 1 month) to complete the month sequence for that particular year. e.g. from the above data frame and using site A. The month duplicate is 01/07/2012 and 26/07/2013. The month sequence for site A is currently (6, 7, 7, 8, 9). The correct month sequence should be (6, 7, 8, 9, 10). For site B the month duplicate is 01/07/2016 and 30/07/2016. The month sequence for site B is currently (4, 5, 7, 7, 8). The correct month sequence should be (4, 5, 6, 7, 8). I'm in need of a function to correct the month sequences.

and and
Answer

This code search all dates which contains the same year and month (duplicates). After it check if there is no sample one month before and after the duplicate in the survey. If there is a sample missing the duplicate date-value got replaced by the first missing month.

add.months= function(date,n) seq(date, by = paste (n, "months"), length = 2)[2]

months <- substr(wq[,2], 1,7)
pos <- which(duplicated(months))

for(z in pos){
  neighbour <- 
    format(
      c(
        add.months(wq[z,2], -1),
        add.months(wq[z,2], 1)
      ),
      format="%Y-%m-%d")

  if(sum(!substr(neighbour,1,7) %in% months) >= 1){
    wq[z,2] <- neighbour[which(!substr(neighbour,1,7) %in% months)[1]]
  }
}

wq <- wq[with(wq, order(site, date)),]
Comments