Mike Furlender Mike Furlender - 24 days ago 8
MySQL Question

MySQL: Group by date proximity?

I wrote this query, it does almost what I want:

SELECT * FROM
(
SELECT COUNT(*) as cnt,
lat,
lon,
elev,
GROUP_CONCAT(CONCAT(usaf,'-',wban))
FROM `ISH-HISTORY_HASPOS`
GROUP BY lat,lon,elev
) AS x WHERE cnt >=1;


output:

+-----+--------+----------+--------+-------------------------------------------------+
| cnt | lat | lon | elev | GROUP_CONCAT(CONCAT(usaf,'-',wban)) |
+-----+--------+----------+--------+-------------------------------------------------+
| 4 | 30.478 | -87.187 | 36 | 722220-13899,722221-13899,722223-13899,999999-13899 |
| 4 | 36.134 | -80.222 | 295.7 | 723190-93807,723191-93807,723193-93807,999999-93807 |
| 5 | 37.087 | -84.077 | 369.1 | 723290-03849,723291-03849,723293-03849,724243-03849,999999-03849 |
| 5 | 38.417 | -113.017 | 1534.1 | 745200-23176,745201-23176,999999-23176,724757-23176,724797-23176 |
| 4 | 40.217 | -76.851 | 105.8 | 999999-14751,725110-14751,725111-14751,725118-14751 |
+-----+--------+----------+--------+-------------------------------------------------+


This returns a concatenated list of stations that are located at identical coordinates. However, I am only interested in concatenating stations with adjoining date ranges. The table that I select from (ISH-HISTORY_HASPOS) has two datetime columns : 'begin' and 'end'. I need the values for these two columns to be within 3 days of each other to satisfy the GROUP_CONCAT conditions.

Edit: In order for a station to be included in the final result's GROUP_CONCAT it must satisfy the following conditions:


  1. It must be co-located with another station in the list (group by
    lat,lon,elev)

  2. Its
    end
    time must be within 3 days of another station's
    begin
    time OR its
    begin
    time must be within 3 days of another station's
    end
    time. When I say "another station", I am referring to stations
    that are co-located (meet the conditions for #1).



I figure that I will have to use a subquery but I can't seem to figure out how to do it. Some help would be greatly appreciated! Either a query or a stored procedure would be great but a php solution would also be acceptable.

Here is a dump of the table that I am querying:sql dump

The results should look the same as my example, but non-adjoining items (date-wise) should not be there.

Answer Source

A solution could be using a subquery to compute the list of station within 3 days of each other and adding this subquery as a where clause to the main query. The subquery consists of a cartesian product to list all possible station couples with a first condition to get just the first half of the resulting matrix and two conditions to specify the time constraints. As to these latter conditions I just guessed them, I don't really know the begin and end fields unit of measure. The resulting query could be this:

SELECT * FROM (
    SELECT COUNT(*) AS
       cnt,
       lat,
       lon,
       elev,
    GROUP_CONCAT(CONCAT(usaf, '-', wban))  
    FROM ISH-HISTORY_HASPOS  
    WHERE id IN (
        SELECT DISTINCT t1.id  
        FROM ISH-HISTORY_HASPOS t1  
        INNER JOIN ISH-HISTORY_HASPOS t2
           ON t1.lon = t2.lon
           AND t1.lat = t2.lat
           AND t1.elev = t2.elev  
        WHERE t1.id < t2.id
            AND abs(t1.begin - t2.end) < 259200
            AND abs(t1.end - t2.begin) < 259200  
        UNION  
        SELECT DISTINCT t2.id  
        FROM ISH-HISTORY_HASPOS t1  
        INNER JOIN ISH-HISTORY_HASPOS t2
            ON t1.lon = t2.lon
            AND t1.lat = t2.lat
            AND t1.elev = t2.elev  
        WHERE t1.id < t2.id
            AND abs(t1.begin - t2.end) < 259200
            AND abs(t1.end - t2.begin) < 259200
    )
    GROUP BY lat, lon, elev   
) AS x WHERE cnt >= 1;