SQL 带过滤的内连接

Question

我有2个表如下：

Table1:

ID  Date  

1   2022-01-01
2   2022-02-01
3   2022-02-05

Table2

ID   Date         Amount
 
1    2021-08-01     15
1    2022-02-10     15
2    2022-02-15      20
2    2021-01-01     15
2    2022-02-20     20
1    2022-03-01     15

我想 select Table2 中的行，这样只有 Table1 中 Date 之后的行在 Table2 中被 select 编辑] 并按 ID 计算每个子集 grouped 的每个子集和 Table2 中的 max(date) 的总和。所以结果看起来像

ID    Date         Amount
1     2022-03-01    30
2     2022-02-20    40

SQL 新手...我尝试了内部连接，但无法传递日期过滤器...

尝试查询：

with table1 as (select * from table1)
,table2 as (select * from table2)
select * from table1 a
inner join table2 b on (a.id=b.id)

谢谢！

Answer 1

个人不熟悉 Snowflake，但应该工作的标准 SQL 查询是：

select id, Max(date) Date, Sum(Amount) Amount
from Table2 t2
where exists (
  select * from Table1 t1 
  where t1.Id = t2.Id and t1.Date < t2.Date
)
group by Id;

请注意，因为您只需要来自 Table2 的数据，所以 exists 比内部联接更可取，并且在几乎所有情况下都比联接更高效，最坏的情况是一样。

Answer 2

以下是我如何使用 Snowflake 执行此操作：

--create the tables and load data

--table1
CREATE TABLE TABLE1 (ID NUMBER, DATE DATE);

INSERT INTO TABLE1 VALUES (1,   '2022-01-01');
INSERT INTO TABLE1 VALUES (2  , '2022-02-01');
INSERT INTO TABLE1 VALUES (3  , '2022-02-05');

--table 2
CREATE TABLE TABLE2 (ID NUMBER, DATE DATE, AMOUNT NUMBER);
 
INSERT INTO TABLE2 VALUES(1,   '2021-08-01',    15);
INSERT INTO TABLE2 VALUES(1,   '2022-02-10',    15);
INSERT INTO TABLE2 VALUES(2,   '2022-02-15',    20);
INSERT INTO TABLE2 VALUES(2,   '2021-01-01',    15);
INSERT INTO TABLE2 VALUES(2,   '2022-02-20',    20);
INSERT INTO TABLE2 VALUES(1,   '2022-03-01',    15);

现在使用select

获取数据

SELECT TABLE1.ID, MAX(TABLE2.DATE), SUM(AMOUNT)
FROM TABLE1, TABLE2
WHERE TABLE1.ID = TABLE2.ID
  AND TABLE1.DATE < TABLE2.DATE 
  GROUP BY TABLE1.ID

结果

ID	MAX(TABLE2.DATE)	SUM(AMOUNT)
1	2022-03-01	30
2	2022-02-20	40

Answer 3

很像 Paul，我会使用 JOIN 但我会将子句放在 ON 上，所以如果你加入更多的表，SQL 优化器更清楚地看到每 table/join 个基础。我还会在表上使用别名并使用别名，因此不会混淆值的来源，这再次作为一种习惯，在编写更复杂的 SQL 或剪切'n'粘贴到更大的代码块。

所以数据有一些 CTE：

WITH table1(id, date) AS (
    SELECT * FROM VALUES 
        (1,   '2022-01-01'),
        (2  , '2022-02-01'),
        (3  , '2022-02-05')
), table2(id, date, amount) AS (
    SELECT * FROM VALUES
        (1, '2021-08-01'::date, 15),
        (1, '2022-02-10'::date, 15),
        (2, '2022-02-15'::date, 20),
        (2, '2021-01-01'::date, 15),
        (2, '2022-02-20'::date, 20),
        (1, '2022-03-01'::date, 15)
)

以下SQL：

SELECT a.id, 
    max(b.date) as max_date,
    sum(b.amount) as sum_amount
FROM table1 AS a
JOIN table2 AS b
    ON a.id = b.id AND a.date <= b.date
GROUP BY 1
ORDER BY 1;

ID	MAX_DATE	SUM_AMOUNT
1	2022-03-01	30
2	2022-02-20	40

SQL 带过滤的内连接

SQL inner join with filtering

sql

join

snowflake-cloud-data-platform