雪花子查询

Snowflake subquery

我有两个 table。 交易(ID, TERMINALID) 和终端(ID, TERMINALID, EXPORT_DATE)。目标是从 Transaction table 中获取最新记录的每一行 Terminal table。 Snowflake 用作后端。

我有这个 SQL 查询:

SELECT tr.ID,
       (SELECT te.ID
        FROM "Terminal" te
        WHERE te.TERMINALID = tr.TERMINALID
        ORDER BY te.EXPORT_DATE DESC
        LIMIT 1)
FROM "Transaction" tr;

但是我得到这个错误:

SQL compilation error: Unsupported subquery type cannot be evaluated

如果我将 tr.TERMINALID 替换为特定值,错误就会消失。所以我无法从嵌套 SELECT 引用父 table。为什么这是不可能的?查询在 MySQL.

中有效
SELECT
tr.ID
  , (SELECT te.ID
     FROM "Terminal" te 
     WHERE te.TERMINALID = tr.TERMINALID
     ORDER BY te.EXPORT_DATE DESC
     LIMIT 1
    ) AS the_id -- <<-- add an alias for the column
FROM "Transaction" tr
    ;

更新:

  • length for type varchar cannot exceed 10485760
  • 只需使用 varchar(或 text)类型

在这里工作(带引号的标识符):

CREATE TABLE "Transaction" ("ID" VARCHAR(123), "TERMINALID"  VARCHAR(123)) ;
CREATE TABLE "Terminal" ( "ID"  VARCHAR(123), "TERMINALID"  VARCHAR(123), "EXPORT_DATE" DATE);

SELECT tr."ID"
        , (SELECT te."ID"
        FROM "Terminal" te
        WHERE te."TERMINALID" = tr."TERMINALID"
        ORDER BY te."EXPORT_DATE" DESC
        LIMIT 1) AS meuk
FROM "Transaction" tr
        ;

奖励更新:避免标量子查询并使用普通旧 NOT EXISTS(...) 获取最近日期的记录:

SELECT tr."ID"
        , te."ID" AS meuk
FROM "Transaction" tr
JOIN "Terminal" te ON te."TERMINALID" = tr."TERMINALID"
        AND NOT EXISTS ( SELECT *
        FROM "Terminal" nx
        WHERE nx."TERMINALID" = te."TERMINALID"
        AND nx."EXPORT_DATE" > te."EXPORT_DATE"
        )
        ;

恐怕 Snowflake 不支持这种相关子查询。

您可以通过使用 FIRST_VALUE 计算最佳的每个终端 ID 来实现您想要的:

-- First compute per-terminalid best id
with sub1 as (
  select 
    terminalid, 
    first_value(id) over (partition by terminalid order by d desc) id
  from terminal 
),
-- Now, make sure there's only one per terminalid id
sub2 as (
  select 
    terminalid, 
    any_value(id) id
  from sub1
  group by terminalid
)
-- Now use that result
select tr.ID, sub2.id
FROM "Transaction" tr
JOIN sub2 ON tr.terminalid = sub2.terminalid

您可以先 运行 子查询以查看它们的作用。

我们正在努力改进对子查询的支持,可能会有更简单的重写,但我希望它能有所帮助。

几年后(2022),一些相​​关的子查询是支持的,但不是这个:

使用此数据:

WITH transaction(id, terminalid) AS (
    SELECT * FROM VALUES
    (1,10),
    (2,11),
    (3,12)
), terminal(id, terminalid, export_date) AS (
     SELECT * FROM VALUES
    (100, 10, '2022-03-18'::date),
    (101, 10, '2022-03-19'::date),
    (102, 11, '2022-03-20'::date),
    (103, 11, '2022-03-21'::date),
    (104, 11, '2022-03-22'::date),
    (105, 12, '2022-03-23'::date)
)

因此,与 Marcin 的相比,我们现在可以使用 QUALIFY 来一步 select 每个 terminalid 只有一个值:

WITH last_terminal as (
    SELECT id, 
        terminalid
    FROM terminal
    QUALIFY row_number() over(PARTITION BY terminalid ORDER BY export_date desc) = 1
)
SELECT tr.ID,
      te.id
FROM transaction AS tr
JOIN last_terminal AS te 
    ON te.TERMINALID = tr.TERMINALID
ORDER BY 1;

给予:

ID ID
1 101
2 104
3 105

如果您每天有多个终端,并且 terimal.id 是递增的数字,您可以使用:

QUALIFY row_number() over(PARTITION BY terminalid ORDER BY export_date desc, id desc) = 1

现在,如果您的 table 不是那么大,您可以执行 JOIN,然后通过 QUALIFY 进行修剪,并避免 CTE,但在大型 table 上,这会降低性能,因此我只会在执行 ad-hoc 查询时使用此表单,如果出现性能问题,则在可行的情况下交换表单。

SELECT tr.ID,
      te.id
FROM transaction AS tr
JOIN terminal AS te 
    ON te.TERMINALID = tr.TERMINALID
QUALIFY row_number() over(PARTITION BY tr.id ORDER BY te.export_date desc, te.id desc) = 1    
ORDER BY 1;

目前不支持这种子查询。

Working with subqueries - Limitations:

The only type of subquery that allows a LIMIT / FETCH clause is an uncorrelated scalar subquery. Also, because an uncorrelated scalar subquery returns only 1 row, the LIMIT clause has little or no practical value inside a subquery


有问题的查询是相关子查询,因此是结果。

SELECT tr.ID,
       (SELECT te.ID
        FROM "Terminal" te
        WHERE te.TERMINALID = tr.TERMINALID  --correlation
        ORDER BY te.EXPORT_DATE DESC
        LIMIT 1)
FROM "Transaction" tr;