如何在 Azure Synapse 中为标识列设置种子值和增量值?
How to set seed value and incremental value for identity column in Azure Synapse?
我在 Azure Synapse 数据仓库中有一个 table 和 DDL,如下所示:
CREATE TABLE [trans_customer_cdm_ejkb].[cdm_file_process_history]
(
[id] [int] IDENTITY(1,1) NOT NULL,
[layer] [varchar](500) NOT NULL,
[ingest_partition] [varchar](100) NULL,
[status] [varchar](25) NULL,
[last_update_time] [datetime2](7) NULL,
[pipeline_run_id] [varchar](200) NULL
)
WITH
(
DISTRIBUTION = HASH ( [ingest_partition] ),
CLUSTERED COLUMNSTORE INDEX
)
GO
我尝试使用下面的 sql 将值附加到此 table 3 次,id 列的种子值为 45,增量值为 60,尽管 IDENTITY(1,1 ).
DECLARE @now DATETIME2 = GETDATE()
INSERT INTO trans_customer_cdm_ejkb.cdm_file_process_history
(layer, ingest_partition, [status], last_update_time, pipeline_run_id)
VALUES ('src2stg', '2022-02-18-03', 'success', @now, 'Test')
SELECT * FROM trans_customer_cdm_ejkb.cdm_file_process_history
此外,我使用下面的 SQL 检查种子值为 1,增量值为 1。但是,table 没有按预期提供 id 值
SELECT sm.name
, tb.name
, co.name
, ic.seed_value
, ic.increment_value
FROM sys.schemas AS sm
JOIN sys.tables AS tb ON sm.schema_id = tb.schema_id
JOIN sys.columns AS co ON tb.object_id = co.object_id
JOIN sys.identity_columns AS ic ON co.object_id = ic.object_id
AND co.column_id = ic.column_id
WHERE sm.name = 'trans_customer_cdm_ejkb'
AND tb.name = 'cdm_file_process_history'
;
我该如何解决这个问题?
亲切的问候,
IDENTITY
Azure Synapse Analytics 专用 SQL 池中的列确实保证唯一值,但 do not guarantee sequential values。原因是因为数据分布在 60 个分布中;每个分布都有一组唯一的标识值。
如果有一个连续的列对你很重要,那么重新创建 table 而没有 IDENTITY
属性 并将你的 INSERT
语句更改为以下代码,这将生成顺序 ID:
DECLARE @now DATETIME2 = GETDATE()
INSERT INTO trans_customer_cdm_ejkb.cdm_file_process_history
(id, layer, ingest_partition, [status], last_update_time, pipeline_run_id)
SELECT
(SELECT ISNULL(MAX(id),0) FROM trans_customer_cdm_ejkb.cdm_file_process_history) +1 as id,
'src2stg', '2022-02-18-03', 'success', @now, 'Test'
SELECT * FROM trans_customer_cdm_ejkb.cdm_file_process_history
因为你的代码只是插入一行,所以我做了 +1
但通常你会做 + ROW_NUMBER() OVER (ORDER BY [SomeColumn])
.
我在 Azure Synapse 数据仓库中有一个 table 和 DDL,如下所示:
CREATE TABLE [trans_customer_cdm_ejkb].[cdm_file_process_history]
(
[id] [int] IDENTITY(1,1) NOT NULL,
[layer] [varchar](500) NOT NULL,
[ingest_partition] [varchar](100) NULL,
[status] [varchar](25) NULL,
[last_update_time] [datetime2](7) NULL,
[pipeline_run_id] [varchar](200) NULL
)
WITH
(
DISTRIBUTION = HASH ( [ingest_partition] ),
CLUSTERED COLUMNSTORE INDEX
)
GO
我尝试使用下面的 sql 将值附加到此 table 3 次,id 列的种子值为 45,增量值为 60,尽管 IDENTITY(1,1 ).
DECLARE @now DATETIME2 = GETDATE()
INSERT INTO trans_customer_cdm_ejkb.cdm_file_process_history
(layer, ingest_partition, [status], last_update_time, pipeline_run_id)
VALUES ('src2stg', '2022-02-18-03', 'success', @now, 'Test')
SELECT * FROM trans_customer_cdm_ejkb.cdm_file_process_history
SELECT sm.name
, tb.name
, co.name
, ic.seed_value
, ic.increment_value
FROM sys.schemas AS sm
JOIN sys.tables AS tb ON sm.schema_id = tb.schema_id
JOIN sys.columns AS co ON tb.object_id = co.object_id
JOIN sys.identity_columns AS ic ON co.object_id = ic.object_id
AND co.column_id = ic.column_id
WHERE sm.name = 'trans_customer_cdm_ejkb'
AND tb.name = 'cdm_file_process_history'
;
我该如何解决这个问题?
亲切的问候,
IDENTITY
Azure Synapse Analytics 专用 SQL 池中的列确实保证唯一值,但 do not guarantee sequential values。原因是因为数据分布在 60 个分布中;每个分布都有一组唯一的标识值。
如果有一个连续的列对你很重要,那么重新创建 table 而没有 IDENTITY
属性 并将你的 INSERT
语句更改为以下代码,这将生成顺序 ID:
DECLARE @now DATETIME2 = GETDATE()
INSERT INTO trans_customer_cdm_ejkb.cdm_file_process_history
(id, layer, ingest_partition, [status], last_update_time, pipeline_run_id)
SELECT
(SELECT ISNULL(MAX(id),0) FROM trans_customer_cdm_ejkb.cdm_file_process_history) +1 as id,
'src2stg', '2022-02-18-03', 'success', @now, 'Test'
SELECT * FROM trans_customer_cdm_ejkb.cdm_file_process_history
因为你的代码只是插入一行,所以我做了 +1
但通常你会做 + ROW_NUMBER() OVER (ORDER BY [SomeColumn])
.