Tomas Tomas - 2 months ago 11
SQL Question

SQL/Vertica - grouping multi-attribute combinations

I have the following type of dataset:

user_id country1 city1 country2 city2
1 usa new york france paris
2 usa dallas japan tokyo
3 india mumbai italy rome
4 france paris usa new york
5 brazil sao paulo russia moscow


I want to group the combinations of
country1
,
city1
,
country2
and
city2
where the order (is something
country1
or
country2
) should not matter. Usually, I would try:

SELECT country1
, city1
, country2
, city2
, COUNT(*)
FROM dataset
GROUP BY country1
, city1
, country2
, city2


However, this code snippet considers the rows with
user_id=1
and
user_id=4
as two separate cases where I would like them to be considered as equivalent.

Anyone who knows how to coop with this problem?

Thanks in advance!

Answer

Normally, you approach this type of problem using least() and greatest(), but you have two columns, instead of one. So, let's do it by comparing cities. I am guessing that city is more unique than country:

select (case when city1 < city2 then country1 else country2 end) as country1,
       (case when city1 < city2 then city1 else city2 end) as city1,
       (case when city1 < city2 then country2 else country1 end) as country2,
       (case when city1 < city2 then city2 else city1 end) as city2,
       count(*)
from dataset
group by (case when city1 < city2 then country1 else country2 end),
       (case when city1 < city2 then city1 else city2 end),
       (case when city1 < city2 then country2 else country1 end),
       (case when city1 < city2 then city2 else city1 end)