如何摆脱 Hive/Impala 中的重复计数
How to Get Rid of Duplicate Counts in Hive/Impala
我正在尝试计算 Impala/Hive 中三个 table 的特定列的总值,但我似乎只能获得每个 table 的总值。例如,我收到的是波兰每个 table 的计数,而不是波兰所有三个 table 的计数。我曾尝试将 table 合并在一起,但没有成功。下面列出的是我使用过的编码。
SELECT table1.country, COUNT(*)
FROM table1
GROUP BY table1.country
UNION
SELECT table2.country, COUNT(*)
FROM table2
GROUP BY table2.country
UNION
SELECT table3.country, COUNT(*)
FROM table3
GROUP BY table3.country
ORDER BY COUNT(country) DESC;
使用UNION ALL
代替UNION
:
SELECT table1.country, COUNT(*)
FROM table1
GROUP BY table1.country
UNION ALL
SELECT table2.country, COUNT(*)
FROM table2
GROUP BY table2.country
UNION ALL
SELECT table3.country, COUNT(*)
FROM table3
GROUP BY table3.country
ORDER BY COUNT(country) DESC;
UNION
删除重复项,因此如果两个表对一个国家/地区的计数相同,则删除重复项。
编辑:
如果您希望每个国家/地区一行,请使用子查询并重新聚合:
SELECT country, SUM(cnt)
FROM (SELECT table1.country, COUNT(*) as cnt
FROM table1
GROUP BY table1.country
UNION ALL
SELECT table2.country, COUNT(*)
FROM table2
GROUP BY table2.country
UNION ALL
SELECT table3.country, COUNT(*)
FROM table3
GROUP BY table3.country
) t
GROUP BY country;
我正在尝试计算 Impala/Hive 中三个 table 的特定列的总值,但我似乎只能获得每个 table 的总值。例如,我收到的是波兰每个 table 的计数,而不是波兰所有三个 table 的计数。我曾尝试将 table 合并在一起,但没有成功。下面列出的是我使用过的编码。
SELECT table1.country, COUNT(*)
FROM table1
GROUP BY table1.country
UNION
SELECT table2.country, COUNT(*)
FROM table2
GROUP BY table2.country
UNION
SELECT table3.country, COUNT(*)
FROM table3
GROUP BY table3.country
ORDER BY COUNT(country) DESC;
使用UNION ALL
代替UNION
:
SELECT table1.country, COUNT(*)
FROM table1
GROUP BY table1.country
UNION ALL
SELECT table2.country, COUNT(*)
FROM table2
GROUP BY table2.country
UNION ALL
SELECT table3.country, COUNT(*)
FROM table3
GROUP BY table3.country
ORDER BY COUNT(country) DESC;
UNION
删除重复项,因此如果两个表对一个国家/地区的计数相同,则删除重复项。
编辑:
如果您希望每个国家/地区一行,请使用子查询并重新聚合:
SELECT country, SUM(cnt)
FROM (SELECT table1.country, COUNT(*) as cnt
FROM table1
GROUP BY table1.country
UNION ALL
SELECT table2.country, COUNT(*)
FROM table2
GROUP BY table2.country
UNION ALL
SELECT table3.country, COUNT(*)
FROM table3
GROUP BY table3.country
) t
GROUP BY country;