估计的行数与执行计划相去甚远

Estimated number of rows is way off in execution plan

我遇到的情况是执行计划中的估计行数偏离了

我在联接中的列是 varchar(50)。我尝试了不同的索引,但它并没有减少这个问题。我什至尝试过使用临时索引 table。我还能做什么?

PS 这是估计数字开始漂移的第一个地方...而且 table 并不大(48000 行)。

密码是:

SELECT DISTINCT householdnumber, householdid, primaryCustomerID
INTO #Households
FROM TableA

    
SELECT 
A.*,
MIN(B.[ProfileCreatedDate]) PROFILECREATEDDATE 
INTO #Profile
from #Households AS a 
LEFT JOIN TableA AS B
    
ON A.[HouseholdNumber]= B.[HouseholdNumber] and A.[HouseholdId]=B.[HouseholdId]
GROUP BY a.householdnumber, a.householdid, a.primaryCustomerID;

我知道这似乎可以改写为:

SELECT householdnumber, householdid, primaryCustomerID, MIN([ProfileCreatedDate]) AS PROFILECREATEDDATE 
INTO #Profile2
from TableA
GROUP BY householdnumber, householdid, primaryCustomerID;

但结果并不相同,我不想更改结果,因为我不确定这段代码的创建者是否知道他们在做什么。

列的一些统计信息: householdnumber 始终等于 householdidhouseholdidnvarchar(50)householdnumbervarchar(40)。 table 有 48877 行。 householdnumber, householdid, primaryCustomerID 的不同组合有 48029 行。 primaryCustomerID 的不同数量是 47152.

关于代码 - 看起来更大(原始)版本和更简单的 GROUP BY 版本之间的区别在于原始版本为 anyone[=24= 找到最小值 profilecreateddate ] 在该家庭中,而您的简单版本会找到特定 primarycustomerid.

profilecreateddate

例如(使用更简单的数据)

CREATE TABLE #TableA (householdnumber int, householdid int, primaryCustomerID int, ProfileCreatedDate datetime);
INSERT INTO #TableA (householdnumber, householdid, primaryCustomerID, ProfileCreatedDate) VALUES
(1, 1, 1, '20201001'),
(1, 1, 1, '20201002'),
(1, 1, 2, '20201003');

SELECT DISTINCT householdnumber, householdid, primaryCustomerID
INTO #Households
FROM #TableA;

SELECT 
    A.*,
    MIN(B.[ProfileCreatedDate]) PROFILECREATEDDATE 
INTO #Profile
from #Households AS a 
    LEFT JOIN #TableA AS B  
      ON A.[HouseholdNumber]= B.[HouseholdNumber] and A.[HouseholdId]=B.[HouseholdId]
GROUP BY a.householdnumber, a.householdid, a.primaryCustomerID;

SELECT * FROM #Profile;
/* -- Results
householdnumber  householdid  primaryCustomerID  PROFILECREATEDDATE
1                1            1                  2020-10-01 00:00:00.000
1                1            2                  2020-10-01 00:00:00.000
*/

SELECT householdnumber, householdid, primaryCustomerID, MIN([ProfileCreatedDate]) AS PROFILECREATEDDATE 
INTO #Profile2
from #TableA
GROUP BY householdnumber, householdid, primaryCustomerID;

SELECT * FROM #Profile2;
/* -- Results
householdnumber  householdid  primaryCustomerID  PROFILECREATEDDATE
1                1            1                  2020-10-01 00:00:00.000
1                1            2                  2020-10-03 00:00:00.000
*/

如果您在上面注意到,第 2 行的 PROFILECREATEDATE 不同。

因此您可以尝试下面的代码,它应该会给出与原始集相同的结果 - 看看时间如何(并确认它与原始结果相匹配)。

SELECT DISTINCT t1.householdnumber, t1.householdid, primaryCustomerID, 
        MIN([ProfileCreatedDate]) OVER (PARTITION BY t1.householdnumber, t1.householdid) AS PROFILECREATEDDATE 
INTO #Profile3
FROM #TableA t1;

SELECT * FROM #Profile3;
/* -- Results
householdnumber  householdid  primaryCustomerID  PROFILECREATEDDATE
1                1            1                  2020-10-01 00:00:00.000
1                1            2                  2020-10-01 00:00:00.000
*/