Synapse 无服务器 TPC-H Query15 语法错误

Synapse serverless TPC-H Query15 wrong syntax

我正在尝试 TPC-H 查询,除了 Number 15,它们都工作正常,基本上 supplier_no 没有被识别,你知道如何重写它吗,我对所有查询所做的唯一改变是用 top

替换 limit
SELECT
    --Query15
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
FROM
    supplier,
    (
        SELECT
            l_suppkey AS supplier_no,
            SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
        FROM
            lineitem
        WHERE
            l_shipdate >= CAST('1996-01-01' AS date)
            AND l_shipdate < CAST('1996-04-01' AS date)
        GROUP BY
            supplier_no
    ) revenue0
WHERE
    s_suppkey = supplier_no
    AND total_revenue = (
        SELECT
            MAX(total_revenue)
        FROM
            (
                SELECT
                    l_suppkey AS supplier_no,
                    SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
                FROM
                    lineitem
                WHERE
                    l_shipdate >= CAST('1996-01-01' AS date)
                    AND l_shipdate < CAST('1996-04-01' AS date)
                GROUP BY
                    supplier_no
            ) revenue1
    )
ORDER BY
    s_suppkey;

如果您遇到以下错误,您只需要确保您指的是本例中的源列名称 (l_suppkey),而不是本例中的别名 (supplier_no)案例:

Msg 207, Level 16, State 1, Line 1 Invalid column name 'supplier_no'.

Msg 164, Level 15, State 1, Line 1 Each GROUP BY expression must contain at least one column that is not an outer reference.

已针对 Azure Synapse Analytics 中的专用 SQL 池进行测试的完整语句:

SELECT
    --Query15
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    total_revenue
FROM
    supplier,
    (
        SELECT
            l_suppkey AS supplier_no,
            SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
        FROM
            lineitem
        WHERE
            l_shipdate >= CAST('1996-01-01' AS date)
            AND l_shipdate < CAST('1996-04-01' AS date)
        GROUP BY
            l_suppkey
    ) revenue0
WHERE
    s_suppkey = supplier_no
    AND total_revenue = (
        SELECT
            MAX(total_revenue)
        FROM
            (
                SELECT
                    l_suppkey AS supplier_no,
                    SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
                FROM
                    lineitem
                WHERE
                    l_shipdate >= CAST('1996-01-01' AS date)
                    AND l_shipdate < CAST('1996-04-01' AS date)
                GROUP BY
                    l_suppkey
            ) revenue1
    )
ORDER BY
    s_suppkey;

NB SQL 服务器可以引用 ORDER BY 语句中的别名,但不能引用 GROUP BY.

有关 Azure Synapse Serverless 性能的相关讨论 SQL 池:

只是为了好玩,我通过 l_shipdate 重新分区了我的 TPC-H SF10 dbo.lineitem table,添加了 filepath() 元数据函数来过滤并得到了热查询下降到 1 秒,第一个 运行 7 秒。所以一些缓存似乎确实在起作用。

我知道你 不需要 为其他平台做这些非常特定于查询的优化,但我想看看是否有可能提高性能。

我想 Q14 是测试各个数据库引擎中的特定转换规则:

查询:

;WITH cte AS
(
SELECT
    l_suppkey,
    SUM(l_extendedprice * (1 - l_discount)) AS total_revenue
FROM OPENROWSET(
    BULK 'enriched/tpch/tpch10/lineitem_partitioned/*/*.parquet',
    DATA_SOURCE = 'MyDataSource',
    FORMAT = 'PARQUET'
    ) x
WHERE x.filepath(1) = 1996
  AND l_shipdate Between CAST('1996-01-01' AS DATE) And CAST('1996-04-01' AS DATE)
GROUP BY l_suppkey
)
SELECT
    s.s_suppkey,
    s.s_name,
    s.s_address,
    s.s_phone,
    c.total_revenue
FROM ext.supplier s
    INNER JOIN cte c ON s.s_suppkey = c.l_suppkey
WHERE total_revenue = ( SELECT MAX(total_revenue) FROM cte );