SQL：为什么 distinct 和 max 不删除重复项？

Question

下面的查询是否应该删除重复项：

SELECT DISTINCT Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType
FROM            DimAccount AS ACC RIGHT OUTER JOIN
                             (SELECT DISTINCT PropertyID, MAX(TenancyStartDate) AS Tenancystart
                               FROM            DimAccount
                               WHERE        (AccountStatus = 'Current')
                               GROUP BY PropertyID, TenancyStartDate) AS Relevant ON ACC.PropertyID = Relevant.PropertyID AND ACC.TenancyStartDate = Relevant.Tenancystart
GROUP BY Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType, ACC.TenancyType

根据我的理解（以及我想要发生的事情），括号中的查询正在选择属性 ID 以及状态为当前返回最高租赁开始日期的 ID（尽管有几次).然后通过开始日期和属性 id 将其加入原始 table，以获得最新的租户类型。

为什么它仍然返回重复的行！？

（顺便说一下，这与我昨天遇到的另一个问题有关，但显然回复不应该进入对话，所以我想我会把它分开......我希望这是正确的做法...我已经搜索过了，但显然我对某些东西的理解缺少了一些东西！）

Answer 1

首先，使用 group by 时几乎不需要 select distinct。

您的查询的问题是子查询中的 group by 子句。

SELECT Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType
FROM DimAccount ACC RIGHT OUTER JOIN
     (SELECT PropertyID, MAX(TenancyStartDate) AS Tenancystart
      FROM  DimAccount
      WHERE (AccountStatus = 'Current')
      GROUP BY PropertyID
    ) Relevant
     ON ACC.PropertyID = Relevant.PropertyID AND
        ACC.TenancyStartDate = Relevant.Tenancystart
GROUP BY Relevant.PropertyID, ACC.TenancyStartDate, ACC.AccountID, ACC.TenancyType;

不应该有TenancyStartDate。此外，您的外部查询在 group by.

中有两次 ACC.TenancyType

也就是说，使用解析函数编写查询更容易：

select a.*
from (select a.*,
             max(tenancystartdate) over (partition by propertyid) as max_tsd
      from dimaccount a
      where accountstatus = 'Current'
     ) a
where tenancystartdate = max_tsd;

这与您的查询完全不一样，因为您的查询将考虑非当前记录。不过，我猜这可能是故意的。

Answer 2

回答你的问题：是的，你是对的，不能重复。而且我很确定有 none。我也很确定你的查询并不像你想象的那样。

这是你的派生table:

SELECT DISTINCT PropertyID, MAX(TenancyStartDate) AS Tenancystart
FROM            DimAccount
WHERE        (AccountStatus = 'Current')
GROUP BY PropertyID, TenancyStartDate

按 PropertyID 和 TenancyStartDate 分组时，每个 PropertyID 和 TenancyStartDate 占一行。对于每个这样的行，您都需要 MAX(TenancyStartDate)，这当然是 TenancyStartDate 本身。没有您聚合的其他字段，因此您根本不聚合，而只是使行不同，对此可以使用 DISTINCT。然后你确实使用 DISTINCT 来获得唯一的结果记录，但你的记录已经是唯一的，通过你混淆的方式。所以你说：select 不同记录的不同记录。您的子查询可以重写为：

SELECT DISTINCT PropertyID, TenancyStartDate
FROM DimAccount
WHERE AccountStatus = 'Current'

然后您外部加入 DimAccount table。因此，即使没有匹配的 DimAccount 记录，您也会保留找到的记录。但是：您 select 从 DimAccount 编辑过，所以当然总是至少有一条您已经找到的记录。您的外部联接实际上是内部联接。然后派生查询中显示的唯一字段是 PropertyID，它始终等于 ACC.PropertyID。这意味着：您只是 selecting 来自 ACC 的记录，派生的 table 只是为了确保 PropertyID 和 TenancyStartDate 的 'Current' 记录存在 .因此，您的查询可以重写为：

SELECT DISTINCT 
  PropertyID, TenancyStartDate, AccountID, TenancyType
FROM DimAccount AS ACC 
WHERE EXISTS
(
  SELECT *
  FROM DimAccount CurrentAccount
  WHERE CurrentAccount.AccountStatus = 'Current'
  AND CurrentAccount.PropertyID = ACC.PropertyID
  AND CurrentAccount.TenancyStartDate = ACC.TenancyStartDate
);

如果 PropertyID + TenancyStartDate + AccountID + TenancyType 是唯一的（AccountID 是 table 的 ID 吗？）那么您甚至可以删除 DISTINCT。

此查询首先获取所有 'Current' DimAccount 记录，然后为您提供具有相同 PropertyID 和 TenancyStartDate 的所有记录。但是，根据您的解释，您似乎想要 select latest 'Current' 每个 PropertyID 的 DimAccount 记录。这完全是另外一回事。根据您使用的 dbms（您没有在标签中指定您的），此类任务有不同的解决方案。

SQL：为什么 distinct 和 max 不删除重复项？

SQL: Why is distinct and max not removing duplicates?

sql

duplicates

visual-studio