优化我的 Azure SQL PaaS Table and/or 查询以提高性能
Optimize my Azure SQL PaaS Table and/or query to increase performance
我正在设计一个 table 具有非常专业的使用模式。
table 将以有限的流量连续记录 - 每秒约 25 条记录,然后每天晚上我 运行 一个大查询来提取大量数据。
我的 table 创建脚本当前如下所示:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (select * from sysobjects where name='records' and xtype='U')
CREATE TABLE [dbo].[records](
[TripID] varchar(255) NOT NULL,
[RecordTimeUTC] datetime2(0) NOT NULL,
[TimeOfDaySeconds] [int] NOT NULL,
[T0Latitude] [float] NOT NULL,
[T0Longitude] [float] NOT NULL,
[T1Latitude] [float] NULL,
[T1Longitude] [float] NULL,
[T2Latitude] [float] NULL,
[T2Longitude] [float] NULL,
[T3Latitude] [float] NULL,
[T3Longitude] [float] NULL,
[T4Latitude] [float] NULL,
[T4Longitude] [float] NULL,
[T5Latitude] [float] NULL,
[T5Longitude] [float] NULL,
[VehicleID] [int] NULL,
[ID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY
) ON [PRIMARY]
GO
IF NOT EXISTS (select * from sys.indexes where name='TripIDRecordTimeIndex' and object_id = OBJECT_ID('dbo.records'))
CREATE INDEX TripIDRecordTimeIndex ON records (TripID, RecordTimeUTC desc)
GO
IF NOT EXISTS (select * from sys.indexes where name='TripIDIndex' and object_id = OBJECT_ID('dbo.records'))
CREATE INDEX TripIDIndex ON records (TripID)
GO
IF NOT EXISTS (select * from sys.indexes where name='RecordTimeUTCIndex' and object_id = OBJECT_ID('dbo.records'))
CREATE INDEX RecordTimeUTCIndex ON records (RecordTimeUTC desc)
GO
IF NOT EXISTS (select * from sys.objects where name like 'UniqueConstraint2' and parent_object_id = OBJECT_ID('dbo.records'))
ALTER TABLE [dbo].[records] ADD CONSTRAINT UniqueConstraint2 UNIQUE(VehicleID, RecordTimeUTC desc);
GO
IF NOT EXISTS (select * from sys.indexes where name='VehicleIDIndex' and object_id = OBJECT_ID('dbo.records'))
CREATE INDEX VehicleIDIndex ON records (VehicleID)
GO
目前我的 table 中有大约 6000 万条记录,而且大小不到 50 GB。
提取数据的查询非常耗时。目前需要一个多小时。我不确定是我的 table 设计还是查询设计才是根本原因(尽管可能两者都有)。
我需要为我指定的一组 TripID 中的每个 TripID 提取最新的 X 项。大约有 10k 个不同的 ID,我通常想查询其中的大约一半。它们之间的 X 也不同,所以我目前最好的查询方式是生成一个看起来有点像这样的脚本:
SELECT rs.* FROM (SELECT *, ROW_NUMBER() over (Partition BY TripID ORDER BY RecordTimeUTC DESC ) AS Rank FROM records where TripID in (20141000,20441000,30011022,30011021,30011008,30012029,30012028,30012027,30011007,30011019,30011018,30012026,30012025,30012024,30011017,30011016,30012023,30012022,30011015,30011014,30012021,30012020,30011013,30011012,30013000,30013001,30013019,30013009,30011011,30011010,30011009,30013008,30013007,30012010,30012009,30013005,30013004,30013003,30012014,30012019,30013021,30013020,30011006,30011004,30012018,30012017,30012016,30013006,30011003,30011002,30012015,30012013,30013013,30013002,30011001,30011000,30011020,30012012,30012011,30011005,30011030,30012001,30012008,30012007,30011029,30011028,30012006,30012005,30011031,30011027,30012004,30012003,30011026,30011025,30011024,30012002,30012000,30012031,30011023,30012030,30015005,30016006,30016013,30016012,30014020,30014019,30014018,30016011,30016010,30014017,30014016,30016009,30016008,30014015,30014013,30014012,30016005,30016004,30016003,30014010,30014009,30016002,30016001,30014008,30014007,30016000,30016007,30014006,30014005,30014004,30014003,30014002,30014001,30014000,30014023,30014014,30015012,30015004,30015003,30013018,30013017,30015002,30015001,30013016,30013015,30013014,30015000,30015013,30015011,30013012,30013011,30015010,30015009,30013010,30014011,30015008,30015007,30014022,30014021,30015006,33651001,33661006)) rs WHERE Rank <= 690
UNION
SELECT rs.* FROM (SELECT *, ROW_NUMBER() over (Partition BY TripID ORDER BY RecordTimeUTC DESC ) AS Rank FROM records where TripID in (20431003,20431002,20431001,20432003,20432002,20432001,30221001,33861002,33861003)) rs WHERE Rank <= 855
UNION
SELECT rs.* FROM (SELECT *, ROW_NUMBER() over (Partition BY TripID ORDER BY RecordTimeUTC DESC ) AS Rank FROM records where TripID in (20171029,20171030,20002002,26122001)) rs WHERE Rank <= 45
UNION
...
(上面的查询 returns 第一个列表中每个行程有 690 个实例,第二个列表中每个行程有 855 个实例,第三个列表中有 45 个实例,依此类推。查询比这个大得多 - 这是只是其中的一小部分。我总共提取了 10-15 百万行)
如前所述,我的表现很糟糕。是云的东西吗?是设计的东西吗?我应该使用聚簇索引吗? (尝试过 TripID,但情况更糟)。我能以某种方式改进我的查询吗?例如每个ID提取相同数量的实例,然后过滤?
我注意到我有几个额外的索引可能不会在我的查询中使用。我只是尝试添加更多,因为插入性能不是问题。计划是在我的查询中使用 TripIDRecordTimeIndex。
即使将 Azure SQL 中的数据计划扩展到 S7 (800 DPU),我也无法快速达到 运行。感谢任何反馈。
编辑:我最近将 TripID 从 int 更改为 varchar(255) - 这会影响我的表现吗?
Edit2:执行计划:
Download link to full execution plan
Edit3:发现在我查询的 TripID 周围添加引号 ('') 极大地提高了性能!
Edit4:我按照 TheGameiswar 的建议添加了索引 - 区别是白天和黑夜!谢谢!附上新的执行计划。
SELECT rs.* FROM (SELECT *,
ROW_NUMBER() over (Partition BY TripID ORDER BY RecordTimeUTC DESC )
AS Rank FROM records where TripID in (20141000,20441000,30011022,30011021,30011008,30012029,30012028,30012027,30011007,30011019,30011018, 30012026,30012025.....)) rs WHERE Rank <= 690
您拥有的索引对于以下部分查询没有用...
SELECT *,
ROW_NUMBER() over (Partition BY TripID ORDER BY RecordTimeUTC DESC )
AS Rank FROM records where TripID in
我会创建一个如下所示的索引
create index nci_sometst on table (tripid,recorddatetime)
include(<remaining columsn you are selecting>)
上面的查询可以帮助获取 tripID 的 IN 部分的记录,但是你正在使用派生的 table 计算排名,如果你的内部查询的结果集很大,这可能没有太大帮助..
我可能会把它放到一个临时文件中 table 并在等级上创建一个索引,所以这对其他联合查询也有帮助。
还查看了您的执行计划,我可以看到您正在多次扫描相同的 table 并且每次都读取很多行
即使没有数据类型转换警告,您的查询也不会有效地使用您拥有的任何索引
我正在设计一个 table 具有非常专业的使用模式。 table 将以有限的流量连续记录 - 每秒约 25 条记录,然后每天晚上我 运行 一个大查询来提取大量数据。
我的 table 创建脚本当前如下所示:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF NOT EXISTS (select * from sysobjects where name='records' and xtype='U')
CREATE TABLE [dbo].[records](
[TripID] varchar(255) NOT NULL,
[RecordTimeUTC] datetime2(0) NOT NULL,
[TimeOfDaySeconds] [int] NOT NULL,
[T0Latitude] [float] NOT NULL,
[T0Longitude] [float] NOT NULL,
[T1Latitude] [float] NULL,
[T1Longitude] [float] NULL,
[T2Latitude] [float] NULL,
[T2Longitude] [float] NULL,
[T3Latitude] [float] NULL,
[T3Longitude] [float] NULL,
[T4Latitude] [float] NULL,
[T4Longitude] [float] NULL,
[T5Latitude] [float] NULL,
[T5Longitude] [float] NULL,
[VehicleID] [int] NULL,
[ID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY
) ON [PRIMARY]
GO
IF NOT EXISTS (select * from sys.indexes where name='TripIDRecordTimeIndex' and object_id = OBJECT_ID('dbo.records'))
CREATE INDEX TripIDRecordTimeIndex ON records (TripID, RecordTimeUTC desc)
GO
IF NOT EXISTS (select * from sys.indexes where name='TripIDIndex' and object_id = OBJECT_ID('dbo.records'))
CREATE INDEX TripIDIndex ON records (TripID)
GO
IF NOT EXISTS (select * from sys.indexes where name='RecordTimeUTCIndex' and object_id = OBJECT_ID('dbo.records'))
CREATE INDEX RecordTimeUTCIndex ON records (RecordTimeUTC desc)
GO
IF NOT EXISTS (select * from sys.objects where name like 'UniqueConstraint2' and parent_object_id = OBJECT_ID('dbo.records'))
ALTER TABLE [dbo].[records] ADD CONSTRAINT UniqueConstraint2 UNIQUE(VehicleID, RecordTimeUTC desc);
GO
IF NOT EXISTS (select * from sys.indexes where name='VehicleIDIndex' and object_id = OBJECT_ID('dbo.records'))
CREATE INDEX VehicleIDIndex ON records (VehicleID)
GO
目前我的 table 中有大约 6000 万条记录,而且大小不到 50 GB。
提取数据的查询非常耗时。目前需要一个多小时。我不确定是我的 table 设计还是查询设计才是根本原因(尽管可能两者都有)。
我需要为我指定的一组 TripID 中的每个 TripID 提取最新的 X 项。大约有 10k 个不同的 ID,我通常想查询其中的大约一半。它们之间的 X 也不同,所以我目前最好的查询方式是生成一个看起来有点像这样的脚本:
SELECT rs.* FROM (SELECT *, ROW_NUMBER() over (Partition BY TripID ORDER BY RecordTimeUTC DESC ) AS Rank FROM records where TripID in (20141000,20441000,30011022,30011021,30011008,30012029,30012028,30012027,30011007,30011019,30011018,30012026,30012025,30012024,30011017,30011016,30012023,30012022,30011015,30011014,30012021,30012020,30011013,30011012,30013000,30013001,30013019,30013009,30011011,30011010,30011009,30013008,30013007,30012010,30012009,30013005,30013004,30013003,30012014,30012019,30013021,30013020,30011006,30011004,30012018,30012017,30012016,30013006,30011003,30011002,30012015,30012013,30013013,30013002,30011001,30011000,30011020,30012012,30012011,30011005,30011030,30012001,30012008,30012007,30011029,30011028,30012006,30012005,30011031,30011027,30012004,30012003,30011026,30011025,30011024,30012002,30012000,30012031,30011023,30012030,30015005,30016006,30016013,30016012,30014020,30014019,30014018,30016011,30016010,30014017,30014016,30016009,30016008,30014015,30014013,30014012,30016005,30016004,30016003,30014010,30014009,30016002,30016001,30014008,30014007,30016000,30016007,30014006,30014005,30014004,30014003,30014002,30014001,30014000,30014023,30014014,30015012,30015004,30015003,30013018,30013017,30015002,30015001,30013016,30013015,30013014,30015000,30015013,30015011,30013012,30013011,30015010,30015009,30013010,30014011,30015008,30015007,30014022,30014021,30015006,33651001,33661006)) rs WHERE Rank <= 690
UNION
SELECT rs.* FROM (SELECT *, ROW_NUMBER() over (Partition BY TripID ORDER BY RecordTimeUTC DESC ) AS Rank FROM records where TripID in (20431003,20431002,20431001,20432003,20432002,20432001,30221001,33861002,33861003)) rs WHERE Rank <= 855
UNION
SELECT rs.* FROM (SELECT *, ROW_NUMBER() over (Partition BY TripID ORDER BY RecordTimeUTC DESC ) AS Rank FROM records where TripID in (20171029,20171030,20002002,26122001)) rs WHERE Rank <= 45
UNION
...
(上面的查询 returns 第一个列表中每个行程有 690 个实例,第二个列表中每个行程有 855 个实例,第三个列表中有 45 个实例,依此类推。查询比这个大得多 - 这是只是其中的一小部分。我总共提取了 10-15 百万行)
如前所述,我的表现很糟糕。是云的东西吗?是设计的东西吗?我应该使用聚簇索引吗? (尝试过 TripID,但情况更糟)。我能以某种方式改进我的查询吗?例如每个ID提取相同数量的实例,然后过滤?
我注意到我有几个额外的索引可能不会在我的查询中使用。我只是尝试添加更多,因为插入性能不是问题。计划是在我的查询中使用 TripIDRecordTimeIndex。
即使将 Azure SQL 中的数据计划扩展到 S7 (800 DPU),我也无法快速达到 运行。感谢任何反馈。
编辑:我最近将 TripID 从 int 更改为 varchar(255) - 这会影响我的表现吗?
Edit2:执行计划:
Download link to full execution plan
Edit3:发现在我查询的 TripID 周围添加引号 ('') 极大地提高了性能!
Edit4:我按照 TheGameiswar 的建议添加了索引 - 区别是白天和黑夜!谢谢!附上新的执行计划。
SELECT rs.* FROM (SELECT *,
ROW_NUMBER() over (Partition BY TripID ORDER BY RecordTimeUTC DESC )
AS Rank FROM records where TripID in (20141000,20441000,30011022,30011021,30011008,30012029,30012028,30012027,30011007,30011019,30011018, 30012026,30012025.....)) rs WHERE Rank <= 690
您拥有的索引对于以下部分查询没有用...
SELECT *,
ROW_NUMBER() over (Partition BY TripID ORDER BY RecordTimeUTC DESC )
AS Rank FROM records where TripID in
我会创建一个如下所示的索引
create index nci_sometst on table (tripid,recorddatetime)
include(<remaining columsn you are selecting>)
上面的查询可以帮助获取 tripID 的 IN 部分的记录,但是你正在使用派生的 table 计算排名,如果你的内部查询的结果集很大,这可能没有太大帮助..
我可能会把它放到一个临时文件中 table 并在等级上创建一个索引,所以这对其他联合查询也有帮助。
还查看了您的执行计划,我可以看到您正在多次扫描相同的 table 并且每次都读取很多行
即使没有数据类型转换警告,您的查询也不会有效地使用您拥有的任何索引