如何使用索引加速当前查询

Question

我在 Azure SQL 数据库中使用 v12 服务器，我有以下 table：

CREATE TABLE [dbo].[AudienceNiches]( [Id] [bigint] IDENTITY(1,1) NOT NULL, [WebsiteId] [nvarchar](128) NOT NULL, [VisitorId] [nvarchar](128) NOT NULL, [VisitDate] [datetime] NOT NULL, [Interest] [nvarchar](50) NULL, [Gender] [float] NULL, [AgeFrom18To24] [float] NULL, [AgeFrom25To34] [float] NULL, [AgeFrom45To54] [float] NULL, [AgeFrom55To64] [float] NULL, [AgeFrom65Plus] [float] NULL, [AgeFrom35To44] [float] NULL, CONSTRAINT [PK_AudienceNiches] PRIMARY KEY CLUSTERED ( [Id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) )

我正在执行此查询：（更新查询）

`select  a.interest, count(interest) from (
select visitorid, interest
from audienceNiches
WHERE WebsiteId = @websiteid
AND VisitDate >= @startdate
AND VisitDate <= @enddate
group by visitorid, interest) as a
group by a.interest`

我有以下索引（所有 ASC）：

idx_WebsiteId_VisitDate_VisitorId idx_WebsiteId_VisitDate idx_VisitorId idx_Interest

问题是我的查询return 18K 行大约需要5秒，整个table有880万条记录，如果我扩展一点数据，时间会增加很多，那么，这个查询的最佳索引是什么？我错过了什么？

Answer 1

索引可能需要几乎无限的理解，但在您的情况下，我认为通过将 WebsiteId 和 VisitDate 索引为单独的索引，您会看到良好的性能提升。

确保您的索引处于良好状态非常重要。您需要通过使统计信息保持最新并定期重建索引来维护它们。

最后，您应该在调整查询性能时检查查询计划。 SQL 服务器会告诉您它是否认为它会从一个（或多个）列的索引中受益，并且还会提醒您注意其他与性能相关的问题。

在 Management Studio 中按 Ctrl+L 并查看查询的情况。

Answer 2

我很难在没有数据进行测试的情况下编写 SQL，但看看这是否能以更短的执行时间提供您正在寻找的结果。

SELECT interest, count(distinct visitorid)
FROM audienceNiches
WHERE WebsiteId = @websiteid
AND VisitDate between @startdate and @enddate
AND interest is not null 
GROUP BY interest

Answer 3

此查询的最佳索引是这些列的复合索引，顺序为：

WebsiteId
访问日期
兴趣
访客编号

这样就可以完全从索引中回答查询。 SqlServer 可以在 (WebsiteId, VisitDate) 上进行范围扫描，然后排除 null Interest，最后从索引中计算不同的 VisitorIds。索引条目将以正确的顺序排列，以允许这些操作有效地进行。

Answer 4

您的查询可以这样写，因为在最终结果集中您不会从table audienceNiches 中提取列visitorid，因此无需编写两个不同级别的group by。检查此查询，让我知道是否仍面临性能问题。

select  interest, count(interest)
from audienceNiches
WHERE WebsiteId = @websiteid
AND VisitDate >= @startdate
AND VisitDate <= @enddate
group by interest

Answer 5

首先，您更新后的查询可以有效地简化为：

select an.Interest, count(an.Interest)
from dbo.AudienceNiches an
where an.WebsiteId = @WebSiteId
    and an.VisitDate between @startdate and @enddate
group by an.Interest;

其次，根据数据的基数，以下索引之一将提供最佳性能：

create index IX_AudienceNiches_WebSiteId_VisitDate_Interest on dbo.AudienceNiches
(WebSiteId, VisitDate, Interest);

或

create index IX_AudienceNiches_VisitDate_WebSiteId_Interest on dbo.AudienceNiches
(VisitDate, WebSiteId, Interest);

但是，随着您的数据的增长，我认为平均而言，最终后者会变得更有效率。

P.S。您的 table 在多个方面严重反规范化。我只希望你知道你在做什么。

如何使用索引加速当前查询

How to speed up current query with index

sql

tsql

sql-server

indexing

azure-sql-database