Dave Jarvis Dave Jarvis - 4 months ago 15
PHP Question

Calculate year for end date: PostgreSQL

Background



Users can pick dates as shown in the following screen shot:



Any starting month/day and ending month/day combinations are valid, such as:


  • Mar 22 to Jun 22

  • Dec 1 to Feb 28



The second combination is difficult (I call it the "tricky date scenario") because the year for the ending month/day should come after the year for the starting month/day. That is to say, for the year 1900 (also shown selected in the screen shot above), the full dates would be:


  • Dec 22, 1900 to Feb 28, 1901

  • Dec 22, 1901 to Feb 28, 1902

  • ...

  • Dec 22, 2007 to Feb 28, 2008

  • Dec 22, 2008 to Feb 28, 2009



Problem



Writing a SQL statement that selects values from a table with dates that fall between the start month/day and end month/day, regardless of how the start and end days are selected. In other words, this is a year wrapping problem.

Inputs



The query receives as parameters:


  • Year1, Year2: The full range of years, independent of month/day combination.

  • Month1, Day1: The starting day within the year to gather data.

  • Month2, Day2: The ending day within the year (or the next year) to gather data.



Previous Attempt



Consider the following MySQL code (that worked):

end_year = start_year +
greatest( -1 *
sign(
datediff(
date(
concat_ws('-', year, end_month, end_day )
),
date(
concat_ws('-', year, start_month, start_day )
)
)
), 0
)


How it works, with respect to the tricky date scenario:


  1. Create two dates in the current year.

  2. The first date is Dec 22, 1900 and the second date is Feb 28, 1900.

  3. Count the difference, in days, between the two dates.

  4. If the result is negative, it means the year for the second date must be incremented by 1. In this case:

    • Add 1 to the current year.

    • Create a new end date: Feb 28, 1901.

    • Check to see if the date range for the data falls between the start and calculated end date.


  5. If the result is positive, the dates have been provided in chronological order and nothing special needs to be done.



This worked in MySQL because the difference in dates would be positive or negative.In PostgreSQL, the equivalent functionality always returns a positive number, regardless of their relative chronological order. (My tests of subtracting dates may have been incorrect, though.)

Question



How should the following (broken) code be rewritten for PostgreSQL to take into consideration the relative chronological order of the starting and ending month/day pairs (with respect to introducing that tricky annual temporal displacement)?

SELECT
m.amount
FROM
measurement m
WHERE
(extract(MONTH FROM m.taken) >= month1 AND
extract(DAY FROM m.taken) >= day1) AND
(extract(MONTH FROM m.taken) <= month2 AND
extract(DAY FROM m.taken) <= day2)


Any thoughts, comments, or questions?

(The dates are pre-parsed into MM/DD format in PHP. My preference is for a pure PostgreSQL solution, but I am open to suggestions on what might make the problem simpler using PHP. The SQL lives in a stored procedure that is called by JasperReports, so the only items PHP can touch are the dates passed into the reporting engine.)

Update #1



The following code nearly works:

select
(extract(YEAR FROM m.taken)||'-12-12')::date as start_date,
((extract(YEAR FROM m.taken)+
greatest(
0,
sign(
(extract(YEAR FROM m.taken)||'-12-12')::date -
(extract(YEAR FROM m.taken)||'-02-01')::date) ))||'-02-01')::date as end_date,
m.taken,
extract(YEAR FROM m.taken) as year_taken
from
measurement m
where
m.station_id = 200 AND
m.category_id = 1 AND
(m.taken, m.taken) OVERLAPS
((extract(YEAR FROM m.taken)||'-12-12')::date,
((extract(YEAR FROM m.taken)+
greatest(
0,
sign(
(extract(YEAR FROM m.taken)||'-12-12')::date -
(extract(YEAR FROM m.taken)||'-02-01')::date) ))||'-02-01')::date)


It produces the correct start and end date constraints:

start_date ;end_date ;taken ;year_taken
"1969-12-12";"1970-02-01";"1969-12-12";1969
"1969-12-12";"1970-02-01";"1969-12-13";1969
...
"1969-12-12";"1970-02-01";"1969-12-31";1969
"1970-12-12";"1971-02-01";"1970-12-12";1970


However, the values between
1970-01-01
and
1970-02-01
are missing, but must be included.

Versions



PostgreSQL 8.4.4 and PHP 5.2.10

Answer
SELECT
    m.amount
FROM
    measurement m,
    to_date('Dec 01 1900', 'Mon DD YYYY') AS A(d1),
    to_date('Feb 28 1900', 'Mon DD YYYY') AS B(d2),
    to_date('Feb 28 1901', 'Mon DD YYYY') AS C(d3)
WHERE m.taken
    BETWEEN
        d1 AND
        CASE WHEN d2 < d1 THEN d3 ELSE d2 END

References: conditional expressions, data type formatting functions.

EDIT: Sorry, I thought you wanted a specific year:

SELECT
    amount
FROM
    (SELECT
        M.amount, M.taken,
        to_date('Dec 01 ' || extract(YEAR FROM M.taken), 'Mon DD YYYY'),
        to_date('Feb 28 ' || extract(YEAR FROM M.taken), 'Mon DD YYYY')
     FROM
        measurement AS M
     ) AS A(amount, taken, d1, d2)
WHERE
    (d2 >= d1 AND taken BETWEEN d1 AND d2)
    OR
    (d2 < d1 AND (taken <= d2 OR taken >= d1));

If the set is big, this doesn't have a lot of chances for optimization. In that case, you can have an SQL function that converts all the dates to a certain year (say taken - (extract(year from taken) - 1900) * '1 year'::interval) and then compare with Dec 02 1900 and Feb 28 1900. That way you can index the result of this date conversion function AND you don't have to calculate two dates for each entry.