SQL/Vertica - 分组多属性组合
SQL/Vertica - grouping multi-attribute combinations
我有以下类型的数据集:
user_id country1 city1 country2 city2
1 usa new york france paris
2 usa dallas japan tokyo
3 india mumbai italy rome
4 france paris usa new york
5 brazil sao paulo russia moscow
我想对 country1
、city1
、country2
和 city2
的组合进行分组,其中顺序(是 country1
或 country2
) 应该没关系。通常,我会尝试:
SELECT country1
, city1
, country2
, city2
, COUNT(*)
FROM dataset
GROUP BY country1
, city1
, country2
, city2
但是,此代码片段将带有 user_id=1
和 user_id=4
的行视为两个不同的情况,我希望将它们视为等同的。
有人知道如何解决这个问题吗?
提前致谢!
通常,您使用 least()
和 greatest()
来处理此类问题,但您有两列,而不是一列。所以,让我们通过比较城市来做到这一点。我猜 city
比 country
:
更独特
select (case when city1 < city2 then country1 else country2 end) as country1,
(case when city1 < city2 then city1 else city2 end) as city1,
(case when city1 < city2 then country2 else country1 end) as country2,
(case when city1 < city2 then city2 else city1 end) as city2,
count(*)
from dataset
group by (case when city1 < city2 then country1 else country2 end),
(case when city1 < city2 then city1 else city2 end),
(case when city1 < city2 then country2 else country1 end),
(case when city1 < city2 then city2 else city1 end)
我有以下类型的数据集:
user_id country1 city1 country2 city2
1 usa new york france paris
2 usa dallas japan tokyo
3 india mumbai italy rome
4 france paris usa new york
5 brazil sao paulo russia moscow
我想对 country1
、city1
、country2
和 city2
的组合进行分组,其中顺序(是 country1
或 country2
) 应该没关系。通常,我会尝试:
SELECT country1
, city1
, country2
, city2
, COUNT(*)
FROM dataset
GROUP BY country1
, city1
, country2
, city2
但是,此代码片段将带有 user_id=1
和 user_id=4
的行视为两个不同的情况,我希望将它们视为等同的。
有人知道如何解决这个问题吗?
提前致谢!
通常,您使用 least()
和 greatest()
来处理此类问题,但您有两列,而不是一列。所以,让我们通过比较城市来做到这一点。我猜 city
比 country
:
select (case when city1 < city2 then country1 else country2 end) as country1,
(case when city1 < city2 then city1 else city2 end) as city1,
(case when city1 < city2 then country2 else country1 end) as country2,
(case when city1 < city2 then city2 else city1 end) as city2,
count(*)
from dataset
group by (case when city1 < city2 then country1 else country2 end),
(case when city1 < city2 then city1 else city2 end),
(case when city1 < city2 then country2 else country1 end),
(case when city1 < city2 then city2 else city1 end)