LEFT JOIN 问题,在 PrestoSQL 中返回 OUTER JOIN

Problem with LEFT JOIN, returning a OUTER JOIN in PrestoSQL

我想在 table 中只保留每个 ID 的最早记录,其中日期采用 yyyy-mm-dd 格式。如果我同一天有两张或更多的唱片,我只想拿一张,我不在乎他们是什么。

我尝试将 table 与自身连接,但左连接不起作用并且 returns 不止一个。

示例原文table:

ID_vendor sales office dt
1 3 A 2021-10-12
1 50 B 2021-10-13
2 109 H 2021-10-13
3 110 H 2021-10-05
4 111 N 2021-10-13
4 112 W 2021-10-13
4 113 B 2021-10-13

预期结果:

ID_vendor sales office
1 3 A
2 109 H
3 110 H
4 111 N

Y 尝试在没有运气的情况下使用分区,现在我被困在这里 LEFT JOIN 返回 OUTER JOIN

欢迎任何帮助。这里的代码:

WITH t as (
    SELECT id_vendor
        , sales 
        , office 
        , min(dt) fst_date
    FROM test_table
    WHERE dt >= date('2021-09-12')
    -- AND id_vendor = '1004618231015'
    GROUP BY id_vendor, sales, office 
    ORDER BY id_vendor
)
, b AS (
SELECT id_vendor
        , sales 
        , office
        , dense_rank() over (order by fst_date) as rnk
FROM t
-- WHERE id_vendor = '1004618231015'
GROUP BY id_vendor
        , sales 
        , office
        , fst_date
        )
, c AS (
SELECT id_vendor
FROM b WHERE rnk = 1
GROUP BY id_vendor
)
, d AS (
SELECT id_vendor
    , sales
    , office
FROM b WHERE rnk = 1)
)
SELECT c.id_vendor
    , d.sales
    , d.office
FROM c
LEFT join d
    ON c.id_vendor = d.id_vendor

您可以简单地使用 Row_number 来获得您预期的结果,如下所示:

select id_vendor, sales , office from (
SELECT id_vendor
        , sales 
        , office 
        ,Row_number() over(partition by id_vendor order by dt) rw
    FROM test_table ) t
where t.rw=1

描述的任务不需要加入,只需使用 row_number 并在 subselect/cte 中按 ID_vendor 进行分区就可以了:

-- sample data
WITH dataset (ID_vendor, sales, office, dt) AS (
    VALUES (1, 3, 'A', date '2021-10-12'),
        (1, 50, 'B', date '2021-10-13'),
        (2, 109, 'H', date '2021-10-13'),
        (3, 110, 'H', date '2021-10-05'),
        (4, 111, 'N', date '2021-10-13'),
        (4, 112, 'W', date '2021-10-13'),
        (4, 113, 'B', date '2021-10-13')
) 

-- query
select id_vendor,
    sales,
    office
from (
        select *,
            row_number() over (partition by id_vendor order by dt) rnk
        from dataset
    )
where rnk = 1
order by id_vendor

输出:

id_vendor sales office
1 3 A
2 109 H
3 110 H
4 111 N