SQL 加入:让双方的所有值都具有累积条件(Presto/AWS Athena)
SQL Join: have all values from both sides with an accumulative condition (Presto/AWS Athena)
我研究这个看似简单的问题已经有一段时间了,但没有解决方案,假设我有一个带有日期列表的 table,另一个带有 phone 的 table数字、人物和日期,我需要一个包含所有姓名和所有日期的最终结果,第三列包含在与日期相同或大于日期的任何日期出现的唯一 phone 数字的数量结果,这是一个例子:
t1
+------------+
| date |
+------------+
| 01/01/2020 |
| 01/02/2020 |
| 01/03/2020 |
| 01/04/2020 |
| 01/05/2020 |
| 01/06/2020 |
| 01/07/2020 |
| 01/08/2020 |
+------------+
t2
+------+------------+--------------+
| name | date | phone_number |
+------+------------+--------------+
| John | 01/01/2020 | 123 |
| Mike | 01/02/2020 | 456 |
| Mike | 01/03/2020 | 789 |
| John | 01/04/2020 | 999 |
| Mike | 01/05/2020 | 111 |
| John | 01/06/2020 | 777 |
| Mike | 01/07/2020 | 123 |
| Mike | 01/08/2020 | 456 |
| John | 01/01/2020 | 789 |
| John | 01/02/2020 | 789 |
| Mike | 01/03/2020 | 789 |
| John | 01/04/2020 | 789 |
+------+------------+--------------+
我想要的结果:
+------+------------+-----------------------------------------------------------------+
| Name | Month | Comulative Unique Numbers (Unique Numbers in any date >= Month) |
+------+------------+-----------------------------------------------------------------+
| John | 01/01/2020 | 4 |
| John | 01/02/2020 | 3 |
| John | 01/03/2020 | 3 |
| John | 01/04/2020 | 3 |
| John | 01/05/2020 | 1 |
| John | 01/06/2020 | 1 |
| John | 01/07/2020 | 0 |
| John | 01/08/2020 | 0 |
| Mike | 01/01/2020 | 4 |
| Mike | 01/02/2020 | 4 |
| Mike | 01/03/2020 | 4 |
| Mike | 01/04/2020 | 3 |
| Mike | 01/05/2020 | 3 |
| Mike | 01/06/2020 | 2 |
| Mike | 01/07/2020 | 2 |
| Mike | 01/08/2020 | 1 |
+------+------------+-----------------------------------------------------------------+
我试了很多方法,这是我认为最接近的:
SELECT * FROM t1
LEFT OUTER JOIN
(SELECT t1.date, COUNT(DISTINCT phone_number) count, name FROM t1
LEFT OUTER JOIN
t2
ON t1.date < t2.date
GROUP BY t1.date,t2.name
ORDER BY 2 DESC) temp
ON t1.date = temp.date
最终结果中我仍然缺少行。
这就是我得到的:
+------+------------+-------+
| name | date | count |
+------+------------+-------+
| null | 2020-08-01 | 0 |
| John | 2020-01-01 | 3 |
| John | 2020-02-01 | 3 |
| John | 2020-03-01 | 3 |
| John | 2020-04-01 | 1 |
| John | 2020-05-01 | 1 |
| Mike | 2020-01-01 | 4 |
| Mike | 2020-02-01 | 4 |
| Mike | 2020-03-01 | 3 |
| Mike | 2020-04-01 | 3 |
| Mike | 2020-05-01 | 2 |
| Mike | 2020-06-01 | 2 |
| Mike | 2020-07-01 | 1 |
+------+------------+-------+
使用日历 table 方法,我们可以构建一个包含所有姓名和所有日期的参考 table。然后,将其加入包含实际数据的第二个 table:
SELECT
b.name,
a.date,
COUNT(DISTINCT t.phone_number) AS unique_numbers
FROM t1 a
CROSS JOIN (SELECT DISTINCT name FROM t2) b
LEFT JOIN t2 t
ON a.date = t.date AND b.name = t.name
GROUP BY
b.name,
a.date
ORDER BY
b.name,
a.date;
我研究这个看似简单的问题已经有一段时间了,但没有解决方案,假设我有一个带有日期列表的 table,另一个带有 phone 的 table数字、人物和日期,我需要一个包含所有姓名和所有日期的最终结果,第三列包含在与日期相同或大于日期的任何日期出现的唯一 phone 数字的数量结果,这是一个例子:
t1
+------------+
| date |
+------------+
| 01/01/2020 |
| 01/02/2020 |
| 01/03/2020 |
| 01/04/2020 |
| 01/05/2020 |
| 01/06/2020 |
| 01/07/2020 |
| 01/08/2020 |
+------------+
t2
+------+------------+--------------+
| name | date | phone_number |
+------+------------+--------------+
| John | 01/01/2020 | 123 |
| Mike | 01/02/2020 | 456 |
| Mike | 01/03/2020 | 789 |
| John | 01/04/2020 | 999 |
| Mike | 01/05/2020 | 111 |
| John | 01/06/2020 | 777 |
| Mike | 01/07/2020 | 123 |
| Mike | 01/08/2020 | 456 |
| John | 01/01/2020 | 789 |
| John | 01/02/2020 | 789 |
| Mike | 01/03/2020 | 789 |
| John | 01/04/2020 | 789 |
+------+------------+--------------+
我想要的结果:
+------+------------+-----------------------------------------------------------------+
| Name | Month | Comulative Unique Numbers (Unique Numbers in any date >= Month) |
+------+------------+-----------------------------------------------------------------+
| John | 01/01/2020 | 4 |
| John | 01/02/2020 | 3 |
| John | 01/03/2020 | 3 |
| John | 01/04/2020 | 3 |
| John | 01/05/2020 | 1 |
| John | 01/06/2020 | 1 |
| John | 01/07/2020 | 0 |
| John | 01/08/2020 | 0 |
| Mike | 01/01/2020 | 4 |
| Mike | 01/02/2020 | 4 |
| Mike | 01/03/2020 | 4 |
| Mike | 01/04/2020 | 3 |
| Mike | 01/05/2020 | 3 |
| Mike | 01/06/2020 | 2 |
| Mike | 01/07/2020 | 2 |
| Mike | 01/08/2020 | 1 |
+------+------------+-----------------------------------------------------------------+
我试了很多方法,这是我认为最接近的:
SELECT * FROM t1
LEFT OUTER JOIN
(SELECT t1.date, COUNT(DISTINCT phone_number) count, name FROM t1
LEFT OUTER JOIN
t2
ON t1.date < t2.date
GROUP BY t1.date,t2.name
ORDER BY 2 DESC) temp
ON t1.date = temp.date
最终结果中我仍然缺少行。
这就是我得到的:
+------+------------+-------+
| name | date | count |
+------+------------+-------+
| null | 2020-08-01 | 0 |
| John | 2020-01-01 | 3 |
| John | 2020-02-01 | 3 |
| John | 2020-03-01 | 3 |
| John | 2020-04-01 | 1 |
| John | 2020-05-01 | 1 |
| Mike | 2020-01-01 | 4 |
| Mike | 2020-02-01 | 4 |
| Mike | 2020-03-01 | 3 |
| Mike | 2020-04-01 | 3 |
| Mike | 2020-05-01 | 2 |
| Mike | 2020-06-01 | 2 |
| Mike | 2020-07-01 | 1 |
+------+------------+-------+
使用日历 table 方法,我们可以构建一个包含所有姓名和所有日期的参考 table。然后,将其加入包含实际数据的第二个 table:
SELECT
b.name,
a.date,
COUNT(DISTINCT t.phone_number) AS unique_numbers
FROM t1 a
CROSS JOIN (SELECT DISTINCT name FROM t2) b
LEFT JOIN t2 t
ON a.date = t.date AND b.name = t.name
GROUP BY
b.name,
a.date
ORDER BY
b.name,
a.date;