使用 CTAS 创建 Azure SQL 数据仓库临时 table 导致意外的行顺序
Azure SQL Data Warehouse temporary table creation with CTAS results in unexpected order of rows
我已经写了一个带有注释的代码示例,所以我将让它进行解释:
/*
Querying sys.system_views to produce a set of rows to use as an example.
No matter how many times the SELECT statement is ran, the results are always the same.
*/
SELECT
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS sequence,
s.name + '.' + sv.name as name
FROM sys.system_views sv
INNER JOIN sys.schemas s ON s.schema_id = sv.schema_id
WHERE s.name = 'INFORMATION_SCHEMA'
/*
Creating a temporary table using the CTAS principle, as is documented at:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-develop-loops
(Since this behaviour was noticed when trying to execute procedures in a defined order.)
*/
CREATE TABLE #list WITH (DISTRIBUTION = ROUND_ROBIN)
AS
SELECT
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS sequence,
s.name + '.' + sv.name as name
FROM sys.system_views sv
INNER JOIN sys.schemas s ON s.schema_id = sv.schema_id
WHERE s.name = 'INFORMATION_SCHEMA'
/*
The results in the temporary table #list are not the
same as the results of the SELECT statement when ran independently.
No matter how many times the temporary table is created,
the results are in the same order, which again,
is not the resulting order when running the SELECT statement.
*/
SELECT *
FROM #list;
DROP TABLE #list;
下面是并排查询结果的示例图片。
Query results of SELECT and SELECT from #list
通过在 ROW_NUMBER 函数中使用正确的 ORDER BY 可以轻松避免此问题:
CREATE TABLE #list WITH (DISTRIBUTION = ROUND_ROBIN)
AS
SELECT
ROW_NUMBER() OVER(ORDER BY s.name + '.' + sv.name) AS sequence,
s.name + '.' + sv.name as name
FROM sys.system_views sv
INNER JOIN sys.schemas s ON s.schema_id = sv.schema_id
WHERE s.name = 'INFORMATION_SCHEMA'
所以我的问题是,为什么#list temporary table 中的顺序与 SELECT 语句中的顺序不同?
If you use non deterministic SQL constructs you can't expect deterministic results. ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) doesn't guarantee any particular ordering as all rows are tied for the ordering value. Any ordering is correct and you will just get whatever is most convenient for the execution plan.
– 马丁·史密斯
在问题的评论中回答和讨论。
我已经写了一个带有注释的代码示例,所以我将让它进行解释:
/*
Querying sys.system_views to produce a set of rows to use as an example.
No matter how many times the SELECT statement is ran, the results are always the same.
*/
SELECT
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS sequence,
s.name + '.' + sv.name as name
FROM sys.system_views sv
INNER JOIN sys.schemas s ON s.schema_id = sv.schema_id
WHERE s.name = 'INFORMATION_SCHEMA'
/*
Creating a temporary table using the CTAS principle, as is documented at:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-develop-loops
(Since this behaviour was noticed when trying to execute procedures in a defined order.)
*/
CREATE TABLE #list WITH (DISTRIBUTION = ROUND_ROBIN)
AS
SELECT
ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS sequence,
s.name + '.' + sv.name as name
FROM sys.system_views sv
INNER JOIN sys.schemas s ON s.schema_id = sv.schema_id
WHERE s.name = 'INFORMATION_SCHEMA'
/*
The results in the temporary table #list are not the
same as the results of the SELECT statement when ran independently.
No matter how many times the temporary table is created,
the results are in the same order, which again,
is not the resulting order when running the SELECT statement.
*/
SELECT *
FROM #list;
DROP TABLE #list;
下面是并排查询结果的示例图片。
Query results of SELECT and SELECT from #list
通过在 ROW_NUMBER 函数中使用正确的 ORDER BY 可以轻松避免此问题:
CREATE TABLE #list WITH (DISTRIBUTION = ROUND_ROBIN)
AS
SELECT
ROW_NUMBER() OVER(ORDER BY s.name + '.' + sv.name) AS sequence,
s.name + '.' + sv.name as name
FROM sys.system_views sv
INNER JOIN sys.schemas s ON s.schema_id = sv.schema_id
WHERE s.name = 'INFORMATION_SCHEMA'
所以我的问题是,为什么#list temporary table 中的顺序与 SELECT 语句中的顺序不同?
If you use non deterministic SQL constructs you can't expect deterministic results. ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) doesn't guarantee any particular ordering as all rows are tied for the ordering value. Any ordering is correct and you will just get whatever is most convenient for the execution plan.
– 马丁·史密斯
在问题的评论中回答和讨论。