Jacek Kotowski Jacek Kotowski - 3 months ago 14
R Question

r-dplyr equivalent of sql query returning monthly utilisation of contracts

My test data looks as follows:

id<-c("aaa","bbb","aaa")
start<-as.Date (c("2016-01-23", "2016-02-24", "2016-01-23", "2016-10-23", "2016-08-24", "2016-04-23"))
duration<-as.numeric(3,3,3,6,3,3)
value<-as.numeric(20,100,20, 100,200,50)

df<-data.frame(cbind(id, start,duration, value))


id - corresponds to id of a contractor

start - start date

duration - how long the contract will last in months

value - the total value of contract, (the value will be divided by duration and aggregated by contractor to get a monthly value of all contracts per contractor).

What I would like to achieve is have a monthly and a quarterly total (total/duration in months if a contract is running in the given month), aggregated by id

Here is a sample done in Access SQL

SELECT id,
"Sum (iif([Start_Date] < Dateserial(" & InpYear & ",01,01) AND DateAdd('m',[Dur_mths], [Start_Date]) >= Dateserial(" & InpYear & ",01,01) ,[Value]/[Dur_mths],0)) AS [Jan],
"Sum (iif([Start_Date] < Dateserial(" & InpYear & ",02,01) AND DateAdd('m',[Dur_mths], [Start_Date]) >= Dateserial(" & InpYear & ",02,01) ,[Value]/[Dur_mths],0)) AS [Feb],
"Sum (iif([Start_Date] < Dateserial(" & InpYear & ",03,01) AND DateAdd('m',[Dur_mths], [Start_Date]) >= Dateserial(" & InpYear & ",03,01) ,[Value]/[Dur_mths],0)) AS [Mar],

[Jan]+[Feb]+[Mar]) AS [_Q1_],


(the code repeats for Q2 through Q4, the year is entered by user)

GROUP BY id

I am wondering how the translation would look like into dplyr in r.
I started with testing operations on dates, eg March 2013 required(lubridate)

required(lubridate)
(start)< as.Date ("2016-03-01") & start %m+% months(duration) >= as.Date("2016-03-01")


to get [1] TRUE TRUE TRUE FALSE FALSE FALSE

I am puzzled however


  • how to translate the conditions (iif) into dplyr and

  • if it will be
    possible in the select to receive the quarterly total immediately in
    the select line.



I am sorry that I did not attempt anything more complicated but I do not know even where to start. I tried, eg. a filter function

df %>%
filter(start< as.Date ("2016-03-01") & start %m+% months(duration) >= as.Date("2016-03-01"))


only to get a bunch of errors... I am in the dark. I will be very grateful for all the hints.

Answer

You get lubridate errors with working with certain date time formats. It works if you remove as.Date and %m+%.

df %>% 
  filter(start< "2016-03-01" & 
           start + months(duration) >="2016-03-01")
Comments