雪花子查询
Snowflake subquery
我有两个 table。 交易(ID, TERMINALID) 和终端(ID, TERMINALID, EXPORT_DATE)。目标是从 Transaction table 中获取最新记录的每一行 Terminal table。 Snowflake 用作后端。
我有这个 SQL 查询:
SELECT tr.ID,
(SELECT te.ID
FROM "Terminal" te
WHERE te.TERMINALID = tr.TERMINALID
ORDER BY te.EXPORT_DATE DESC
LIMIT 1)
FROM "Transaction" tr;
但是我得到这个错误:
SQL compilation error: Unsupported subquery type cannot be evaluated
如果我将 tr.TERMINALID 替换为特定值,错误就会消失。所以我无法从嵌套 SELECT 引用父 table。为什么这是不可能的?查询在 MySQL.
中有效
SELECT
tr.ID
, (SELECT te.ID
FROM "Terminal" te
WHERE te.TERMINALID = tr.TERMINALID
ORDER BY te.EXPORT_DATE DESC
LIMIT 1
) AS the_id -- <<-- add an alias for the column
FROM "Transaction" tr
;
更新:
length for type varchar cannot exceed 10485760
- 只需使用
varchar
(或 text
)类型
在这里工作(带引号的标识符):
CREATE TABLE "Transaction" ("ID" VARCHAR(123), "TERMINALID" VARCHAR(123)) ;
CREATE TABLE "Terminal" ( "ID" VARCHAR(123), "TERMINALID" VARCHAR(123), "EXPORT_DATE" DATE);
SELECT tr."ID"
, (SELECT te."ID"
FROM "Terminal" te
WHERE te."TERMINALID" = tr."TERMINALID"
ORDER BY te."EXPORT_DATE" DESC
LIMIT 1) AS meuk
FROM "Transaction" tr
;
奖励更新:避免标量子查询并使用普通旧 NOT EXISTS(...)
获取最近日期的记录:
SELECT tr."ID"
, te."ID" AS meuk
FROM "Transaction" tr
JOIN "Terminal" te ON te."TERMINALID" = tr."TERMINALID"
AND NOT EXISTS ( SELECT *
FROM "Terminal" nx
WHERE nx."TERMINALID" = te."TERMINALID"
AND nx."EXPORT_DATE" > te."EXPORT_DATE"
)
;
恐怕 Snowflake 不支持这种相关子查询。
您可以通过使用 FIRST_VALUE
计算最佳的每个终端 ID 来实现您想要的:
-- First compute per-terminalid best id
with sub1 as (
select
terminalid,
first_value(id) over (partition by terminalid order by d desc) id
from terminal
),
-- Now, make sure there's only one per terminalid id
sub2 as (
select
terminalid,
any_value(id) id
from sub1
group by terminalid
)
-- Now use that result
select tr.ID, sub2.id
FROM "Transaction" tr
JOIN sub2 ON tr.terminalid = sub2.terminalid
您可以先 运行 子查询以查看它们的作用。
我们正在努力改进对子查询的支持,可能会有更简单的重写,但我希望它能有所帮助。
几年后(2022),一些相关的子查询是支持的,但不是这个:
使用此数据:
WITH transaction(id, terminalid) AS (
SELECT * FROM VALUES
(1,10),
(2,11),
(3,12)
), terminal(id, terminalid, export_date) AS (
SELECT * FROM VALUES
(100, 10, '2022-03-18'::date),
(101, 10, '2022-03-19'::date),
(102, 11, '2022-03-20'::date),
(103, 11, '2022-03-21'::date),
(104, 11, '2022-03-22'::date),
(105, 12, '2022-03-23'::date)
)
因此,与 Marcin 的相比,我们现在可以使用 QUALIFY 来一步 select 每个 terminalid
只有一个值:
WITH last_terminal as (
SELECT id,
terminalid
FROM terminal
QUALIFY row_number() over(PARTITION BY terminalid ORDER BY export_date desc) = 1
)
SELECT tr.ID,
te.id
FROM transaction AS tr
JOIN last_terminal AS te
ON te.TERMINALID = tr.TERMINALID
ORDER BY 1;
给予:
ID
ID
1
101
2
104
3
105
如果您每天有多个终端,并且 terimal.id 是递增的数字,您可以使用:
QUALIFY row_number() over(PARTITION BY terminalid ORDER BY export_date desc, id desc) = 1
现在,如果您的 table 不是那么大,您可以执行 JOIN,然后通过 QUALIFY 进行修剪,并避免 CTE,但在大型 table 上,这会降低性能,因此我只会在执行 ad-hoc 查询时使用此表单,如果出现性能问题,则在可行的情况下交换表单。
SELECT tr.ID,
te.id
FROM transaction AS tr
JOIN terminal AS te
ON te.TERMINALID = tr.TERMINALID
QUALIFY row_number() over(PARTITION BY tr.id ORDER BY te.export_date desc, te.id desc) = 1
ORDER BY 1;
目前不支持这种子查询。
Working with subqueries - Limitations:
The only type of subquery that allows a LIMIT / FETCH clause is an uncorrelated scalar subquery. Also, because an uncorrelated scalar subquery returns only 1 row, the LIMIT clause has little or no practical value inside a subquery
有问题的查询是相关子查询,因此是结果。
SELECT tr.ID,
(SELECT te.ID
FROM "Terminal" te
WHERE te.TERMINALID = tr.TERMINALID --correlation
ORDER BY te.EXPORT_DATE DESC
LIMIT 1)
FROM "Transaction" tr;
我有两个 table。 交易(ID, TERMINALID) 和终端(ID, TERMINALID, EXPORT_DATE)。目标是从 Transaction table 中获取最新记录的每一行 Terminal table。 Snowflake 用作后端。
我有这个 SQL 查询:
SELECT tr.ID,
(SELECT te.ID
FROM "Terminal" te
WHERE te.TERMINALID = tr.TERMINALID
ORDER BY te.EXPORT_DATE DESC
LIMIT 1)
FROM "Transaction" tr;
但是我得到这个错误:
SQL compilation error: Unsupported subquery type cannot be evaluated
如果我将 tr.TERMINALID 替换为特定值,错误就会消失。所以我无法从嵌套 SELECT 引用父 table。为什么这是不可能的?查询在 MySQL.
中有效SELECT
tr.ID
, (SELECT te.ID
FROM "Terminal" te
WHERE te.TERMINALID = tr.TERMINALID
ORDER BY te.EXPORT_DATE DESC
LIMIT 1
) AS the_id -- <<-- add an alias for the column
FROM "Transaction" tr
;
更新:
length for type varchar cannot exceed 10485760
- 只需使用
varchar
(或text
)类型
在这里工作(带引号的标识符):
CREATE TABLE "Transaction" ("ID" VARCHAR(123), "TERMINALID" VARCHAR(123)) ;
CREATE TABLE "Terminal" ( "ID" VARCHAR(123), "TERMINALID" VARCHAR(123), "EXPORT_DATE" DATE);
SELECT tr."ID"
, (SELECT te."ID"
FROM "Terminal" te
WHERE te."TERMINALID" = tr."TERMINALID"
ORDER BY te."EXPORT_DATE" DESC
LIMIT 1) AS meuk
FROM "Transaction" tr
;
奖励更新:避免标量子查询并使用普通旧 NOT EXISTS(...)
获取最近日期的记录:
SELECT tr."ID"
, te."ID" AS meuk
FROM "Transaction" tr
JOIN "Terminal" te ON te."TERMINALID" = tr."TERMINALID"
AND NOT EXISTS ( SELECT *
FROM "Terminal" nx
WHERE nx."TERMINALID" = te."TERMINALID"
AND nx."EXPORT_DATE" > te."EXPORT_DATE"
)
;
恐怕 Snowflake 不支持这种相关子查询。
您可以通过使用 FIRST_VALUE
计算最佳的每个终端 ID 来实现您想要的:
-- First compute per-terminalid best id
with sub1 as (
select
terminalid,
first_value(id) over (partition by terminalid order by d desc) id
from terminal
),
-- Now, make sure there's only one per terminalid id
sub2 as (
select
terminalid,
any_value(id) id
from sub1
group by terminalid
)
-- Now use that result
select tr.ID, sub2.id
FROM "Transaction" tr
JOIN sub2 ON tr.terminalid = sub2.terminalid
您可以先 运行 子查询以查看它们的作用。
我们正在努力改进对子查询的支持,可能会有更简单的重写,但我希望它能有所帮助。
几年后(2022),一些相关的子查询是支持的,但不是这个:
使用此数据:
WITH transaction(id, terminalid) AS (
SELECT * FROM VALUES
(1,10),
(2,11),
(3,12)
), terminal(id, terminalid, export_date) AS (
SELECT * FROM VALUES
(100, 10, '2022-03-18'::date),
(101, 10, '2022-03-19'::date),
(102, 11, '2022-03-20'::date),
(103, 11, '2022-03-21'::date),
(104, 11, '2022-03-22'::date),
(105, 12, '2022-03-23'::date)
)
因此,与 Marcin 的相比,我们现在可以使用 QUALIFY 来一步 select 每个 terminalid
只有一个值:
WITH last_terminal as (
SELECT id,
terminalid
FROM terminal
QUALIFY row_number() over(PARTITION BY terminalid ORDER BY export_date desc) = 1
)
SELECT tr.ID,
te.id
FROM transaction AS tr
JOIN last_terminal AS te
ON te.TERMINALID = tr.TERMINALID
ORDER BY 1;
给予:
ID | ID |
---|---|
1 | 101 |
2 | 104 |
3 | 105 |
如果您每天有多个终端,并且 terimal.id 是递增的数字,您可以使用:
QUALIFY row_number() over(PARTITION BY terminalid ORDER BY export_date desc, id desc) = 1
现在,如果您的 table 不是那么大,您可以执行 JOIN,然后通过 QUALIFY 进行修剪,并避免 CTE,但在大型 table 上,这会降低性能,因此我只会在执行 ad-hoc 查询时使用此表单,如果出现性能问题,则在可行的情况下交换表单。
SELECT tr.ID,
te.id
FROM transaction AS tr
JOIN terminal AS te
ON te.TERMINALID = tr.TERMINALID
QUALIFY row_number() over(PARTITION BY tr.id ORDER BY te.export_date desc, te.id desc) = 1
ORDER BY 1;
目前不支持这种子查询。
Working with subqueries - Limitations:
The only type of subquery that allows a LIMIT / FETCH clause is an uncorrelated scalar subquery. Also, because an uncorrelated scalar subquery returns only 1 row, the LIMIT clause has little or no practical value inside a subquery
有问题的查询是相关子查询,因此是结果。
SELECT tr.ID,
(SELECT te.ID
FROM "Terminal" te
WHERE te.TERMINALID = tr.TERMINALID --correlation
ORDER BY te.EXPORT_DATE DESC
LIMIT 1)
FROM "Transaction" tr;