LEFT JOIN 问题,在 PrestoSQL 中返回 OUTER JOIN
Problem with LEFT JOIN, returning a OUTER JOIN in PrestoSQL
我想在 table 中只保留每个 ID 的最早记录,其中日期采用 yyyy-mm-dd 格式。如果我同一天有两张或更多的唱片,我只想拿一张,我不在乎他们是什么。
我尝试将 table 与自身连接,但左连接不起作用并且 returns 不止一个。
示例原文table:
ID_vendor
sales
office
dt
1
3
A
2021-10-12
1
50
B
2021-10-13
2
109
H
2021-10-13
3
110
H
2021-10-05
4
111
N
2021-10-13
4
112
W
2021-10-13
4
113
B
2021-10-13
预期结果:
ID_vendor
sales
office
1
3
A
2
109
H
3
110
H
4
111
N
Y 尝试在没有运气的情况下使用分区,现在我被困在这里 LEFT JOIN
返回 OUTER JOIN
欢迎任何帮助。这里的代码:
WITH t as (
SELECT id_vendor
, sales
, office
, min(dt) fst_date
FROM test_table
WHERE dt >= date('2021-09-12')
-- AND id_vendor = '1004618231015'
GROUP BY id_vendor, sales, office
ORDER BY id_vendor
)
, b AS (
SELECT id_vendor
, sales
, office
, dense_rank() over (order by fst_date) as rnk
FROM t
-- WHERE id_vendor = '1004618231015'
GROUP BY id_vendor
, sales
, office
, fst_date
)
, c AS (
SELECT id_vendor
FROM b WHERE rnk = 1
GROUP BY id_vendor
)
, d AS (
SELECT id_vendor
, sales
, office
FROM b WHERE rnk = 1)
)
SELECT c.id_vendor
, d.sales
, d.office
FROM c
LEFT join d
ON c.id_vendor = d.id_vendor
您可以简单地使用 Row_number
来获得您预期的结果,如下所示:
select id_vendor, sales , office from (
SELECT id_vendor
, sales
, office
,Row_number() over(partition by id_vendor order by dt) rw
FROM test_table ) t
where t.rw=1
描述的任务不需要加入,只需使用 row_number
并在 subselect/cte 中按 ID_vendor
进行分区就可以了:
-- sample data
WITH dataset (ID_vendor, sales, office, dt) AS (
VALUES (1, 3, 'A', date '2021-10-12'),
(1, 50, 'B', date '2021-10-13'),
(2, 109, 'H', date '2021-10-13'),
(3, 110, 'H', date '2021-10-05'),
(4, 111, 'N', date '2021-10-13'),
(4, 112, 'W', date '2021-10-13'),
(4, 113, 'B', date '2021-10-13')
)
-- query
select id_vendor,
sales,
office
from (
select *,
row_number() over (partition by id_vendor order by dt) rnk
from dataset
)
where rnk = 1
order by id_vendor
输出:
id_vendor
sales
office
1
3
A
2
109
H
3
110
H
4
111
N
我想在 table 中只保留每个 ID 的最早记录,其中日期采用 yyyy-mm-dd 格式。如果我同一天有两张或更多的唱片,我只想拿一张,我不在乎他们是什么。
我尝试将 table 与自身连接,但左连接不起作用并且 returns 不止一个。
示例原文table:
ID_vendor | sales | office | dt |
---|---|---|---|
1 | 3 | A | 2021-10-12 |
1 | 50 | B | 2021-10-13 |
2 | 109 | H | 2021-10-13 |
3 | 110 | H | 2021-10-05 |
4 | 111 | N | 2021-10-13 |
4 | 112 | W | 2021-10-13 |
4 | 113 | B | 2021-10-13 |
预期结果:
ID_vendor | sales | office |
---|---|---|
1 | 3 | A |
2 | 109 | H |
3 | 110 | H |
4 | 111 | N |
Y 尝试在没有运气的情况下使用分区,现在我被困在这里 LEFT JOIN
返回 OUTER JOIN
欢迎任何帮助。这里的代码:
WITH t as (
SELECT id_vendor
, sales
, office
, min(dt) fst_date
FROM test_table
WHERE dt >= date('2021-09-12')
-- AND id_vendor = '1004618231015'
GROUP BY id_vendor, sales, office
ORDER BY id_vendor
)
, b AS (
SELECT id_vendor
, sales
, office
, dense_rank() over (order by fst_date) as rnk
FROM t
-- WHERE id_vendor = '1004618231015'
GROUP BY id_vendor
, sales
, office
, fst_date
)
, c AS (
SELECT id_vendor
FROM b WHERE rnk = 1
GROUP BY id_vendor
)
, d AS (
SELECT id_vendor
, sales
, office
FROM b WHERE rnk = 1)
)
SELECT c.id_vendor
, d.sales
, d.office
FROM c
LEFT join d
ON c.id_vendor = d.id_vendor
您可以简单地使用 Row_number
来获得您预期的结果,如下所示:
select id_vendor, sales , office from (
SELECT id_vendor
, sales
, office
,Row_number() over(partition by id_vendor order by dt) rw
FROM test_table ) t
where t.rw=1
描述的任务不需要加入,只需使用 row_number
并在 subselect/cte 中按 ID_vendor
进行分区就可以了:
-- sample data
WITH dataset (ID_vendor, sales, office, dt) AS (
VALUES (1, 3, 'A', date '2021-10-12'),
(1, 50, 'B', date '2021-10-13'),
(2, 109, 'H', date '2021-10-13'),
(3, 110, 'H', date '2021-10-05'),
(4, 111, 'N', date '2021-10-13'),
(4, 112, 'W', date '2021-10-13'),
(4, 113, 'B', date '2021-10-13')
)
-- query
select id_vendor,
sales,
office
from (
select *,
row_number() over (partition by id_vendor order by dt) rnk
from dataset
)
where rnk = 1
order by id_vendor
输出:
id_vendor | sales | office |
---|---|---|
1 | 3 | A |
2 | 109 | H |
3 | 110 | H |
4 | 111 | N |