Synapse 无服务器 TPC-H Query15 语法错误
Synapse serverless TPC-H Query15 wrong syntax
我正在尝试 TPC-H 查询,除了 Number 15,它们都工作正常,基本上 supplier_no 没有被识别,你知道如何重写它吗,我对所有查询所做的唯一改变是用 top
替换 limit
SELECT
--Query15
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
FROM
supplier,
(
SELECT
l_suppkey AS supplier_no,
SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM
lineitem
WHERE
l_shipdate >= CAST('1996-01-01' AS date)
AND l_shipdate < CAST('1996-04-01' AS date)
GROUP BY
supplier_no
) revenue0
WHERE
s_suppkey = supplier_no
AND total_revenue = (
SELECT
MAX(total_revenue)
FROM
(
SELECT
l_suppkey AS supplier_no,
SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM
lineitem
WHERE
l_shipdate >= CAST('1996-01-01' AS date)
AND l_shipdate < CAST('1996-04-01' AS date)
GROUP BY
supplier_no
) revenue1
)
ORDER BY
s_suppkey;
如果您遇到以下错误,您只需要确保您指的是本例中的源列名称 (l_suppkey
),而不是本例中的别名 (supplier_no
)案例:
Msg 207, Level 16, State 1, Line 1 Invalid column name 'supplier_no'.
Msg 164, Level 15, State 1, Line 1 Each GROUP BY expression must
contain at least one column that is not an outer reference.
已针对 Azure Synapse Analytics 中的专用 SQL 池进行测试的完整语句:
SELECT
--Query15
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
FROM
supplier,
(
SELECT
l_suppkey AS supplier_no,
SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM
lineitem
WHERE
l_shipdate >= CAST('1996-01-01' AS date)
AND l_shipdate < CAST('1996-04-01' AS date)
GROUP BY
l_suppkey
) revenue0
WHERE
s_suppkey = supplier_no
AND total_revenue = (
SELECT
MAX(total_revenue)
FROM
(
SELECT
l_suppkey AS supplier_no,
SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM
lineitem
WHERE
l_shipdate >= CAST('1996-01-01' AS date)
AND l_shipdate < CAST('1996-04-01' AS date)
GROUP BY
l_suppkey
) revenue1
)
ORDER BY
s_suppkey;
NB SQL 服务器可以引用 ORDER BY
语句中的别名,但不能引用 GROUP BY
.
有关 Azure Synapse Serverless 性能的相关讨论 SQL 池:
只是为了好玩,我通过 l_shipdate
重新分区了我的 TPC-H SF10 dbo.lineitem
table,添加了 filepath()
元数据函数来过滤并得到了热查询下降到 1 秒,第一个 运行 7 秒。所以一些缓存似乎确实在起作用。
我知道你 不需要 为其他平台做这些非常特定于查询的优化,但我想看看是否有可能提高性能。
我想 Q14 是测试各个数据库引擎中的特定转换规则:
查询:
;WITH cte AS
(
SELECT
l_suppkey,
SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM OPENROWSET(
BULK 'enriched/tpch/tpch10/lineitem_partitioned/*/*.parquet',
DATA_SOURCE = 'MyDataSource',
FORMAT = 'PARQUET'
) x
WHERE x.filepath(1) = 1996
AND l_shipdate Between CAST('1996-01-01' AS DATE) And CAST('1996-04-01' AS DATE)
GROUP BY l_suppkey
)
SELECT
s.s_suppkey,
s.s_name,
s.s_address,
s.s_phone,
c.total_revenue
FROM ext.supplier s
INNER JOIN cte c ON s.s_suppkey = c.l_suppkey
WHERE total_revenue = ( SELECT MAX(total_revenue) FROM cte );
我正在尝试 TPC-H 查询,除了 Number 15,它们都工作正常,基本上 supplier_no 没有被识别,你知道如何重写它吗,我对所有查询所做的唯一改变是用 top
替换 limitSELECT
--Query15
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
FROM
supplier,
(
SELECT
l_suppkey AS supplier_no,
SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM
lineitem
WHERE
l_shipdate >= CAST('1996-01-01' AS date)
AND l_shipdate < CAST('1996-04-01' AS date)
GROUP BY
supplier_no
) revenue0
WHERE
s_suppkey = supplier_no
AND total_revenue = (
SELECT
MAX(total_revenue)
FROM
(
SELECT
l_suppkey AS supplier_no,
SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM
lineitem
WHERE
l_shipdate >= CAST('1996-01-01' AS date)
AND l_shipdate < CAST('1996-04-01' AS date)
GROUP BY
supplier_no
) revenue1
)
ORDER BY
s_suppkey;
如果您遇到以下错误,您只需要确保您指的是本例中的源列名称 (l_suppkey
),而不是本例中的别名 (supplier_no
)案例:
Msg 207, Level 16, State 1, Line 1 Invalid column name 'supplier_no'.
Msg 164, Level 15, State 1, Line 1 Each GROUP BY expression must contain at least one column that is not an outer reference.
已针对 Azure Synapse Analytics 中的专用 SQL 池进行测试的完整语句:
SELECT
--Query15
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
FROM
supplier,
(
SELECT
l_suppkey AS supplier_no,
SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM
lineitem
WHERE
l_shipdate >= CAST('1996-01-01' AS date)
AND l_shipdate < CAST('1996-04-01' AS date)
GROUP BY
l_suppkey
) revenue0
WHERE
s_suppkey = supplier_no
AND total_revenue = (
SELECT
MAX(total_revenue)
FROM
(
SELECT
l_suppkey AS supplier_no,
SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM
lineitem
WHERE
l_shipdate >= CAST('1996-01-01' AS date)
AND l_shipdate < CAST('1996-04-01' AS date)
GROUP BY
l_suppkey
) revenue1
)
ORDER BY
s_suppkey;
NB SQL 服务器可以引用 ORDER BY
语句中的别名,但不能引用 GROUP BY
.
有关 Azure Synapse Serverless 性能的相关讨论 SQL 池:
只是为了好玩,我通过 l_shipdate
重新分区了我的 TPC-H SF10 dbo.lineitem
table,添加了 filepath()
元数据函数来过滤并得到了热查询下降到 1 秒,第一个 运行 7 秒。所以一些缓存似乎确实在起作用。
我知道你 不需要 为其他平台做这些非常特定于查询的优化,但我想看看是否有可能提高性能。
我想 Q14 是测试各个数据库引擎中的特定转换规则:
查询:
;WITH cte AS
(
SELECT
l_suppkey,
SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM OPENROWSET(
BULK 'enriched/tpch/tpch10/lineitem_partitioned/*/*.parquet',
DATA_SOURCE = 'MyDataSource',
FORMAT = 'PARQUET'
) x
WHERE x.filepath(1) = 1996
AND l_shipdate Between CAST('1996-01-01' AS DATE) And CAST('1996-04-01' AS DATE)
GROUP BY l_suppkey
)
SELECT
s.s_suppkey,
s.s_name,
s.s_address,
s.s_phone,
c.total_revenue
FROM ext.supplier s
INNER JOIN cte c ON s.s_suppkey = c.l_suppkey
WHERE total_revenue = ( SELECT MAX(total_revenue) FROM cte );