估计的行数与执行计划相去甚远
Estimated number of rows is way off in execution plan
我遇到的情况是执行计划中的估计行数偏离了
我在联接中的列是 varchar(50)
。我尝试了不同的索引,但它并没有减少这个问题。我什至尝试过使用临时索引 table。我还能做什么?
PS 这是估计数字开始漂移的第一个地方...而且 table 并不大(48000 行)。
密码是:
SELECT DISTINCT householdnumber, householdid, primaryCustomerID
INTO #Households
FROM TableA
SELECT
A.*,
MIN(B.[ProfileCreatedDate]) PROFILECREATEDDATE
INTO #Profile
from #Households AS a
LEFT JOIN TableA AS B
ON A.[HouseholdNumber]= B.[HouseholdNumber] and A.[HouseholdId]=B.[HouseholdId]
GROUP BY a.householdnumber, a.householdid, a.primaryCustomerID;
我知道这似乎可以改写为:
SELECT householdnumber, householdid, primaryCustomerID, MIN([ProfileCreatedDate]) AS PROFILECREATEDDATE
INTO #Profile2
from TableA
GROUP BY householdnumber, householdid, primaryCustomerID;
但结果并不相同,我不想更改结果,因为我不确定这段代码的创建者是否知道他们在做什么。
列的一些统计信息:
householdnumber
始终等于 householdid
。 householdid
是 nvarchar(50)
但 householdnumber
是 varchar(40)
。 table 有 48877 行。 householdnumber, householdid, primaryCustomerID
的不同组合有 48029 行。 primaryCustomerID
的不同数量是 47152.
关于代码 - 看起来更大(原始)版本和更简单的 GROUP BY 版本之间的区别在于原始版本为 anyone[=24= 找到最小值 profilecreateddate
] 在该家庭中,而您的简单版本会找到特定 primarycustomerid
.
的 profilecreateddate
例如(使用更简单的数据)
CREATE TABLE #TableA (householdnumber int, householdid int, primaryCustomerID int, ProfileCreatedDate datetime);
INSERT INTO #TableA (householdnumber, householdid, primaryCustomerID, ProfileCreatedDate) VALUES
(1, 1, 1, '20201001'),
(1, 1, 1, '20201002'),
(1, 1, 2, '20201003');
SELECT DISTINCT householdnumber, householdid, primaryCustomerID
INTO #Households
FROM #TableA;
SELECT
A.*,
MIN(B.[ProfileCreatedDate]) PROFILECREATEDDATE
INTO #Profile
from #Households AS a
LEFT JOIN #TableA AS B
ON A.[HouseholdNumber]= B.[HouseholdNumber] and A.[HouseholdId]=B.[HouseholdId]
GROUP BY a.householdnumber, a.householdid, a.primaryCustomerID;
SELECT * FROM #Profile;
/* -- Results
householdnumber householdid primaryCustomerID PROFILECREATEDDATE
1 1 1 2020-10-01 00:00:00.000
1 1 2 2020-10-01 00:00:00.000
*/
SELECT householdnumber, householdid, primaryCustomerID, MIN([ProfileCreatedDate]) AS PROFILECREATEDDATE
INTO #Profile2
from #TableA
GROUP BY householdnumber, householdid, primaryCustomerID;
SELECT * FROM #Profile2;
/* -- Results
householdnumber householdid primaryCustomerID PROFILECREATEDDATE
1 1 1 2020-10-01 00:00:00.000
1 1 2 2020-10-03 00:00:00.000
*/
如果您在上面注意到,第 2 行的 PROFILECREATEDATE 不同。
因此您可以尝试下面的代码,它应该会给出与原始集相同的结果 - 看看时间如何(并确认它与原始结果相匹配)。
SELECT DISTINCT t1.householdnumber, t1.householdid, primaryCustomerID,
MIN([ProfileCreatedDate]) OVER (PARTITION BY t1.householdnumber, t1.householdid) AS PROFILECREATEDDATE
INTO #Profile3
FROM #TableA t1;
SELECT * FROM #Profile3;
/* -- Results
householdnumber householdid primaryCustomerID PROFILECREATEDDATE
1 1 1 2020-10-01 00:00:00.000
1 1 2 2020-10-01 00:00:00.000
*/
我遇到的情况是执行计划中的估计行数偏离了
我在联接中的列是 varchar(50)
。我尝试了不同的索引,但它并没有减少这个问题。我什至尝试过使用临时索引 table。我还能做什么?
PS 这是估计数字开始漂移的第一个地方...而且 table 并不大(48000 行)。
密码是:
SELECT DISTINCT householdnumber, householdid, primaryCustomerID
INTO #Households
FROM TableA
SELECT
A.*,
MIN(B.[ProfileCreatedDate]) PROFILECREATEDDATE
INTO #Profile
from #Households AS a
LEFT JOIN TableA AS B
ON A.[HouseholdNumber]= B.[HouseholdNumber] and A.[HouseholdId]=B.[HouseholdId]
GROUP BY a.householdnumber, a.householdid, a.primaryCustomerID;
我知道这似乎可以改写为:
SELECT householdnumber, householdid, primaryCustomerID, MIN([ProfileCreatedDate]) AS PROFILECREATEDDATE
INTO #Profile2
from TableA
GROUP BY householdnumber, householdid, primaryCustomerID;
但结果并不相同,我不想更改结果,因为我不确定这段代码的创建者是否知道他们在做什么。
列的一些统计信息:
householdnumber
始终等于 householdid
。 householdid
是 nvarchar(50)
但 householdnumber
是 varchar(40)
。 table 有 48877 行。 householdnumber, householdid, primaryCustomerID
的不同组合有 48029 行。 primaryCustomerID
的不同数量是 47152.
关于代码 - 看起来更大(原始)版本和更简单的 GROUP BY 版本之间的区别在于原始版本为 anyone[=24= 找到最小值 profilecreateddate
] 在该家庭中,而您的简单版本会找到特定 primarycustomerid
.
profilecreateddate
例如(使用更简单的数据)
CREATE TABLE #TableA (householdnumber int, householdid int, primaryCustomerID int, ProfileCreatedDate datetime);
INSERT INTO #TableA (householdnumber, householdid, primaryCustomerID, ProfileCreatedDate) VALUES
(1, 1, 1, '20201001'),
(1, 1, 1, '20201002'),
(1, 1, 2, '20201003');
SELECT DISTINCT householdnumber, householdid, primaryCustomerID
INTO #Households
FROM #TableA;
SELECT
A.*,
MIN(B.[ProfileCreatedDate]) PROFILECREATEDDATE
INTO #Profile
from #Households AS a
LEFT JOIN #TableA AS B
ON A.[HouseholdNumber]= B.[HouseholdNumber] and A.[HouseholdId]=B.[HouseholdId]
GROUP BY a.householdnumber, a.householdid, a.primaryCustomerID;
SELECT * FROM #Profile;
/* -- Results
householdnumber householdid primaryCustomerID PROFILECREATEDDATE
1 1 1 2020-10-01 00:00:00.000
1 1 2 2020-10-01 00:00:00.000
*/
SELECT householdnumber, householdid, primaryCustomerID, MIN([ProfileCreatedDate]) AS PROFILECREATEDDATE
INTO #Profile2
from #TableA
GROUP BY householdnumber, householdid, primaryCustomerID;
SELECT * FROM #Profile2;
/* -- Results
householdnumber householdid primaryCustomerID PROFILECREATEDDATE
1 1 1 2020-10-01 00:00:00.000
1 1 2 2020-10-03 00:00:00.000
*/
如果您在上面注意到,第 2 行的 PROFILECREATEDATE 不同。
因此您可以尝试下面的代码,它应该会给出与原始集相同的结果 - 看看时间如何(并确认它与原始结果相匹配)。
SELECT DISTINCT t1.householdnumber, t1.householdid, primaryCustomerID,
MIN([ProfileCreatedDate]) OVER (PARTITION BY t1.householdnumber, t1.householdid) AS PROFILECREATEDDATE
INTO #Profile3
FROM #TableA t1;
SELECT * FROM #Profile3;
/* -- Results
householdnumber householdid primaryCustomerID PROFILECREATEDDATE
1 1 1 2020-10-01 00:00:00.000
1 1 2 2020-10-01 00:00:00.000
*/