修改 Group By 以获得正确的结果

Modify Group By to achieve correct results

我有大量数据集,其中包含不活跃客户的历史数据及其账单(服务)信息。我需要按 ClientId、服务和年份对数据进行分组并生成聚合字段,例如 TotalCharges、TotalProvidedServices 和 DaysStayedWithUs(此列有问题)。

我能够成功地按 ClientId 和服务对数据进行分组,但是按年份分组会产生错误的结果,我知道为什么。

原因是因为 Year 列是从 ServiceDate 列生成的,该列最初用于创建 FirstServiceDate 和 LastServiceDate => 以便能够生成 DaysStayedWithUs(LastServiceDate - FirstServiceDate 将为我们提供客户与我们在一起的天数(DaysStayedWithUs) ).当我按年分组时,查询找到 FirstServiceDate、LastServiceDate 并根据它计算 DaysStayedWithUs....因此我必须找到另一种方法来实现我需要的结果(如下所示)。

客户Table:

 ClientId

1234567890

计费Table:

ClientId       Service      ServiceDate   ChargesTotal

1234567890     Cleaning       12/4/2018       190.17
1234567890     Cleaning       1/22/2019        97.8
1234567890     Cleaning       1/29/2019        97.8
1234567890     Cleaning       2/5/2019         97.8
1234567890     Cleaning       2/12/2019        97.8
1234567890     Cleaning       3/5/2019         97.8
1234567890     Cleaning       2/19/2019        97.8
1234567890     Cleaning       3/12/2019        97.8
1234567890     Cleaning       3/19/2019        97.8
1234567890     Cleaning       3/26/2019        97.8

我的代码:

SELECT GroupedTable.ClientId, 
        GroupedTable.Service, 
        ---GroupedTable.Year,
        GroupedTable.FirstServiceDate, 
        GroupedTable.LastServiceDate, 
        GroupedTable.TotalCharges, 
        GroupedTable.TotalProvidedServices, 
        GroupedTable.DaysStayedWithUs

FROM (SELECT MainTable.ClientId, 
        MainTable.Service,
        SUM(MainTable.Charges) AS TotalCharges,      
        COUNT(*) AS TotalProvidedServices, 
        DATEDIFF(DAY, MIN(MainTable.ServiceDate), MAX(MainTable.ServiceDate)) as DaysStayedWithUs,
        ---MainTable.Year,
        MIN(MainTable.ServiceDate) AS FirstServiceDate,
        MAX(MainTable.ServiceDate) AS LastServiceDate

        FROM (SELECT c.ClientId, 
                b.Service,
                b.ServiceDate,
                ---YEAR(b.ServiceDate) AS Year, 
                b.Charges

            FROM Client as c

            LEFT JOIN Billing as b
                ON (c.ClientId = b.ClientId)

            WHERE b.ClientId = 1234567890

            ) as MainTable

---GROUP BY MainTable.ClientId, MainTable.ServiceIdentifier, MainTable.Year) AS GroupedTable
GROUP BY MainTable.ClientId, MainTable.Service) AS GroupedTable

上面的代码产生正确的结果:(取消选择年份)

如果我在分组依据中包含年份,则输出为:

P.S。我用黄色标记了有问题的单元格(技术上 FirstServiceDate 和 LastServiceDate 需要更正,然后 DaysStayedWithUs 将自行调整)

我需要完成的结果:

希望可以实现。 谢谢。

使用 window 函数 MIN OVERMAX OVER 来查看某一年是否是客户和服务的 first/last 年。

SELECT 
  c.clientid, 
  b.service,
  YEAR(b.servicedate) as year, 
  SUM(b.charges) AS total_charges,
  COUNT(*) AS total_provided_services
  CASE WHEN YEAR(b.servicedate) = MIN(YEAR(b.servicedate)) OVER (PARTITION BY c.clientid, b.service)
    THEN MIN(b.servicedate) ELSE DATEFROMPARTS(YEAR(b.servicedate), 1, 1)
  END AS first_service_date, 
  CASE WHEN YEAR(b.servicedate) = MAX(YEAR(b.servicedate)) OVER (PARTITION BY c.clientid, b.service)
    THEN MAX(b.servicedate) ELSE DATEFROMPARTS(YEAR(b.servicedate), 12, 31)
  END AS last_service_date,
  DATEDIFF(DAY, 
    CASE WHEN YEAR(b.servicedate) = MIN(YEAR(b.servicedate)) OVER (PARTITION BY c.clientid, b.service)
      THEN MIN(b.servicedate) ELSE DATEFROMPARTS(YEAR(b.servicedate), 1, 1) END, 
    CASE WHEN YEAR(b.servicedate) = MAX(YEAR(b.servicedate)) OVER (PARTITION BY c.clientid, b.service)
      THEN MAX(b.servicedate) ELSE DATEFROMPARTS(YEAR(b.servicedate), 12, 31) END 
  ) AS days_stayed_with_us
FROM client AS c
LEFT JOIN billing AS b ON b.clientid = c.clientid
WHERE c.clientid = 1234567890
GROUP BY c.clientid, b.service, YEAR(b.timebillingservicedate)
ORDER BY c.clientid, b.service, YEAR(b.timebillingservicedate);