修改 Group By 以获得正确的结果
Modify Group By to achieve correct results
我有大量数据集,其中包含不活跃客户的历史数据及其账单(服务)信息。我需要按 ClientId、服务和年份对数据进行分组并生成聚合字段,例如 TotalCharges、TotalProvidedServices 和 DaysStayedWithUs(此列有问题)。
我能够成功地按 ClientId 和服务对数据进行分组,但是按年份分组会产生错误的结果,我知道为什么。
原因是因为 Year 列是从 ServiceDate 列生成的,该列最初用于创建 FirstServiceDate 和 LastServiceDate => 以便能够生成 DaysStayedWithUs(LastServiceDate - FirstServiceDate 将为我们提供客户与我们在一起的天数(DaysStayedWithUs) ).当我按年分组时,查询找到 FirstServiceDate、LastServiceDate 并根据它计算 DaysStayedWithUs....因此我必须找到另一种方法来实现我需要的结果(如下所示)。
客户Table:
ClientId
1234567890
计费Table:
ClientId Service ServiceDate ChargesTotal
1234567890 Cleaning 12/4/2018 190.17
1234567890 Cleaning 1/22/2019 97.8
1234567890 Cleaning 1/29/2019 97.8
1234567890 Cleaning 2/5/2019 97.8
1234567890 Cleaning 2/12/2019 97.8
1234567890 Cleaning 3/5/2019 97.8
1234567890 Cleaning 2/19/2019 97.8
1234567890 Cleaning 3/12/2019 97.8
1234567890 Cleaning 3/19/2019 97.8
1234567890 Cleaning 3/26/2019 97.8
我的代码:
SELECT GroupedTable.ClientId,
GroupedTable.Service,
---GroupedTable.Year,
GroupedTable.FirstServiceDate,
GroupedTable.LastServiceDate,
GroupedTable.TotalCharges,
GroupedTable.TotalProvidedServices,
GroupedTable.DaysStayedWithUs
FROM (SELECT MainTable.ClientId,
MainTable.Service,
SUM(MainTable.Charges) AS TotalCharges,
COUNT(*) AS TotalProvidedServices,
DATEDIFF(DAY, MIN(MainTable.ServiceDate), MAX(MainTable.ServiceDate)) as DaysStayedWithUs,
---MainTable.Year,
MIN(MainTable.ServiceDate) AS FirstServiceDate,
MAX(MainTable.ServiceDate) AS LastServiceDate
FROM (SELECT c.ClientId,
b.Service,
b.ServiceDate,
---YEAR(b.ServiceDate) AS Year,
b.Charges
FROM Client as c
LEFT JOIN Billing as b
ON (c.ClientId = b.ClientId)
WHERE b.ClientId = 1234567890
) as MainTable
---GROUP BY MainTable.ClientId, MainTable.ServiceIdentifier, MainTable.Year) AS GroupedTable
GROUP BY MainTable.ClientId, MainTable.Service) AS GroupedTable
上面的代码产生正确的结果:(取消选择年份)
如果我在分组依据中包含年份,则输出为:
P.S。我用黄色标记了有问题的单元格(技术上 FirstServiceDate 和 LastServiceDate 需要更正,然后 DaysStayedWithUs 将自行调整)
我需要完成的结果:
希望可以实现。
谢谢。
使用 window 函数 MIN OVER
和 MAX OVER
来查看某一年是否是客户和服务的 first/last 年。
SELECT
c.clientid,
b.service,
YEAR(b.servicedate) as year,
SUM(b.charges) AS total_charges,
COUNT(*) AS total_provided_services
CASE WHEN YEAR(b.servicedate) = MIN(YEAR(b.servicedate)) OVER (PARTITION BY c.clientid, b.service)
THEN MIN(b.servicedate) ELSE DATEFROMPARTS(YEAR(b.servicedate), 1, 1)
END AS first_service_date,
CASE WHEN YEAR(b.servicedate) = MAX(YEAR(b.servicedate)) OVER (PARTITION BY c.clientid, b.service)
THEN MAX(b.servicedate) ELSE DATEFROMPARTS(YEAR(b.servicedate), 12, 31)
END AS last_service_date,
DATEDIFF(DAY,
CASE WHEN YEAR(b.servicedate) = MIN(YEAR(b.servicedate)) OVER (PARTITION BY c.clientid, b.service)
THEN MIN(b.servicedate) ELSE DATEFROMPARTS(YEAR(b.servicedate), 1, 1) END,
CASE WHEN YEAR(b.servicedate) = MAX(YEAR(b.servicedate)) OVER (PARTITION BY c.clientid, b.service)
THEN MAX(b.servicedate) ELSE DATEFROMPARTS(YEAR(b.servicedate), 12, 31) END
) AS days_stayed_with_us
FROM client AS c
LEFT JOIN billing AS b ON b.clientid = c.clientid
WHERE c.clientid = 1234567890
GROUP BY c.clientid, b.service, YEAR(b.timebillingservicedate)
ORDER BY c.clientid, b.service, YEAR(b.timebillingservicedate);
我有大量数据集,其中包含不活跃客户的历史数据及其账单(服务)信息。我需要按 ClientId、服务和年份对数据进行分组并生成聚合字段,例如 TotalCharges、TotalProvidedServices 和 DaysStayedWithUs(此列有问题)。
我能够成功地按 ClientId 和服务对数据进行分组,但是按年份分组会产生错误的结果,我知道为什么。
原因是因为 Year 列是从 ServiceDate 列生成的,该列最初用于创建 FirstServiceDate 和 LastServiceDate => 以便能够生成 DaysStayedWithUs(LastServiceDate - FirstServiceDate 将为我们提供客户与我们在一起的天数(DaysStayedWithUs) ).当我按年分组时,查询找到 FirstServiceDate、LastServiceDate 并根据它计算 DaysStayedWithUs....因此我必须找到另一种方法来实现我需要的结果(如下所示)。
客户Table:
ClientId
1234567890
计费Table:
ClientId Service ServiceDate ChargesTotal
1234567890 Cleaning 12/4/2018 190.17
1234567890 Cleaning 1/22/2019 97.8
1234567890 Cleaning 1/29/2019 97.8
1234567890 Cleaning 2/5/2019 97.8
1234567890 Cleaning 2/12/2019 97.8
1234567890 Cleaning 3/5/2019 97.8
1234567890 Cleaning 2/19/2019 97.8
1234567890 Cleaning 3/12/2019 97.8
1234567890 Cleaning 3/19/2019 97.8
1234567890 Cleaning 3/26/2019 97.8
我的代码:
SELECT GroupedTable.ClientId,
GroupedTable.Service,
---GroupedTable.Year,
GroupedTable.FirstServiceDate,
GroupedTable.LastServiceDate,
GroupedTable.TotalCharges,
GroupedTable.TotalProvidedServices,
GroupedTable.DaysStayedWithUs
FROM (SELECT MainTable.ClientId,
MainTable.Service,
SUM(MainTable.Charges) AS TotalCharges,
COUNT(*) AS TotalProvidedServices,
DATEDIFF(DAY, MIN(MainTable.ServiceDate), MAX(MainTable.ServiceDate)) as DaysStayedWithUs,
---MainTable.Year,
MIN(MainTable.ServiceDate) AS FirstServiceDate,
MAX(MainTable.ServiceDate) AS LastServiceDate
FROM (SELECT c.ClientId,
b.Service,
b.ServiceDate,
---YEAR(b.ServiceDate) AS Year,
b.Charges
FROM Client as c
LEFT JOIN Billing as b
ON (c.ClientId = b.ClientId)
WHERE b.ClientId = 1234567890
) as MainTable
---GROUP BY MainTable.ClientId, MainTable.ServiceIdentifier, MainTable.Year) AS GroupedTable
GROUP BY MainTable.ClientId, MainTable.Service) AS GroupedTable
上面的代码产生正确的结果:(取消选择年份)
如果我在分组依据中包含年份,则输出为:
P.S。我用黄色标记了有问题的单元格(技术上 FirstServiceDate 和 LastServiceDate 需要更正,然后 DaysStayedWithUs 将自行调整)
我需要完成的结果:
希望可以实现。 谢谢。
使用 window 函数 MIN OVER
和 MAX OVER
来查看某一年是否是客户和服务的 first/last 年。
SELECT
c.clientid,
b.service,
YEAR(b.servicedate) as year,
SUM(b.charges) AS total_charges,
COUNT(*) AS total_provided_services
CASE WHEN YEAR(b.servicedate) = MIN(YEAR(b.servicedate)) OVER (PARTITION BY c.clientid, b.service)
THEN MIN(b.servicedate) ELSE DATEFROMPARTS(YEAR(b.servicedate), 1, 1)
END AS first_service_date,
CASE WHEN YEAR(b.servicedate) = MAX(YEAR(b.servicedate)) OVER (PARTITION BY c.clientid, b.service)
THEN MAX(b.servicedate) ELSE DATEFROMPARTS(YEAR(b.servicedate), 12, 31)
END AS last_service_date,
DATEDIFF(DAY,
CASE WHEN YEAR(b.servicedate) = MIN(YEAR(b.servicedate)) OVER (PARTITION BY c.clientid, b.service)
THEN MIN(b.servicedate) ELSE DATEFROMPARTS(YEAR(b.servicedate), 1, 1) END,
CASE WHEN YEAR(b.servicedate) = MAX(YEAR(b.servicedate)) OVER (PARTITION BY c.clientid, b.service)
THEN MAX(b.servicedate) ELSE DATEFROMPARTS(YEAR(b.servicedate), 12, 31) END
) AS days_stayed_with_us
FROM client AS c
LEFT JOIN billing AS b ON b.clientid = c.clientid
WHERE c.clientid = 1234567890
GROUP BY c.clientid, b.service, YEAR(b.timebillingservicedate)
ORDER BY c.clientid, b.service, YEAR(b.timebillingservicedate);