suneesh suneesh - 6 months ago 10
SQL Question

group by multiple columns to get common entries for month

Please see the following MySQL table.

-

-
-- Table structure for table `sample_table`
--

CREATE TABLE IF NOT EXISTS `sample_table` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`station` varchar(300) NOT NULL,
`city` varchar(300) NOT NULL,
`reviewcount` int(6) NOT NULL,
`result_month` varchar(10) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;

INSERT INTO `sample_table`(`id`, `station`, `reviewcount`, `result_month`) VALUES
(1, 'A','NewYork', 10, 'Jan'),
(2, 'B','NewYork', 12, 'Jan'),
(3, 'C','Florida', 14, 'Jan'),
(4, 'A','NewYork', 12, 'Feb'),
(5, 'B','NewYork', 14, 'Feb'),
(6, 'C','Florida', 20, 'Feb'),
(7, 'D','Washington', 12, 'Feb'),
(8, 'A','NewYork', 14, 'Mar'),
(9, 'B','NewYork', 14, 'Mar'),
(10, 'C','Florida', 28, 'Mar');


Here I need to get the same station information over a range of periods.

I.e, If we select range From January to March,the stations with result rows for each months should be displayed.

Here only stations A and C is having review counts for every months in the selected range,i.e from Jan to Mar.

Hence I need to get results like

A - Jan - 10
C - Jan - 14

A - Feb - 12
C - Feb - 20

A - Mar - 14
C - Mar - 28


I tried GROUP BY station,result_month and also tried JOIN of table,But it does not help.

I am working with a huge table with thousands of stations and I need the query to be optimized too.

Second Section

I need to get city wise aggregate information too,
Where the output should look like,

City -Station Count - Review Count

NewYork - 2 - 22 -Jan
Florida - 1 - 14 - Jan

NewYork - 2 - 26 -Jan
Florida - 1 - 10 - Jan

NewYork - 2 - 28 -Jan
Florida - 1 - 28 - Jan


Any help will be highly appreciated.Thanx in advance.

Answer

The first stage is to get stations that have an entry in every month:

SELECT  station
FROM    sample_table
GROUP BY station
HAVING COUNT(DISTINCT result_month) = 
            (SELECT COUNT(DISTINCT result_month) FROM sample_table);

This will get you A and C (Example on SQL Fiddle).

You can then join this back to your main table:

SELECT  t.station, t.reviewcount, t.result_month
FROM    sample_table t
        INNER JOIN
        (   SELECT  station
            FROM    sample_table
            GROUP BY station
            HAVING COUNT(DISTINCT result_month) = 
                        (SELECT COUNT(DISTINCT result_month) FROM sample_table)
        ) t2
            ON t2.station = t.station;

Example on SQL Fiddle


EDIT

Your second requirement is a little unclear, I am not sure at which point you want to group the cities together, i.e. if your data was as follows:

(1, 'A','NewYork', 10, 'Jan'),
(2, 'B','NewYork', 10, 'Jan'),
(3, 'C','NewYork', 10, 'Jan'),
(4, 'A','NewYork', 10, 'Feb'),
(5, 'B','NewYork', 10, 'Feb'),
(6, 'A','NewYork', 10, 'Mar'),
(7, 'B','NewYork', 10, 'Mar');

Should the sum include station C because New York has an entry for every month? So January would have a total of 30. In which case your query would be:

SELECT  t.City, SUM(t.reviewcount) AS reviewcount, t.result_month
FROM    sample_table t
        INNER JOIN
        (   SELECT  City
            FROM    sample_table
            GROUP BY City
            HAVING COUNT(DISTINCT result_month) = 
                        (SELECT COUNT(DISTINCT result_month) FROM sample_table)
        ) t2
            ON t2.City = t.City
GROUP BY t.City, t.result_month;

Example on SQL Fiddle

Or should it only include A and B because these stations have an entry every month? So Jan only has a total of 20. In which your query would be

SELECT  t.City, SUM(t.reviewcount) AS reviewcount, t.result_month
FROM    sample_table t
        INNER JOIN
        (   SELECT  station
            FROM    sample_table
            GROUP BY station
            HAVING COUNT(DISTINCT result_month) = 
                        (SELECT COUNT(DISTINCT result_month) FROM sample_table)
        ) t2
            ON t2.station = t.station
GROUP BY t.City, t.result_month;

Example on SQL Fiddle