从开始日期和结束日期构建时间表

Build timeline from start and end dates

我有一个订阅 table,具有 用户 ID订阅开始日期订阅结束日期。我还有一个带有 datestamp 字段的日历 table,即从我订阅的第一个订阅日期开始的每个日期 table.

我正在尝试写一些东西,它会给我一个 table,其中包含一个日期列和三个数字:总活跃人数(当天)、新订阅者人数、取消订阅者人数。

(N.B。我尝试使用建议的 GitHub Flavored Markdown 插入示例 tables,但它只是全部排成一行。)

目前我正在玩一个在两个 table 之间创建多个连接的查询,每个数字一个:

select a.datestamp
,count(distinct case when b_sub.UserID is not null then b_sub.UserID end) as total_w_subscription
,count(distinct case when b_in.UserID is not null then b_in.UserID end) as total_subscribed
,count(distinct case when b_out.UserID is not null then b_out.UserID end) as total_unsubscribed

from Calendar as a

left join Subscription as b_sub -- all those with subscription on given date
on b_sub.sub_dt <= a.datestamp
and (b_sub.unsub_dt > a.datestamp or b_sub.unsub_dt is null)

left join Subscription as b_in -- all those that subscribed on given date
on b_in.sub_dt = a.datestamp

left join Subscription as b_out -- all those that unsubscribed on given date
on b_out.unsub_dt = a.datestamp

where a.datestamp > '2021-06-10'

group by a.datestamp
order by datestamp asc
;

我已将两个 table 中的日期字段编入索引。如果我只看一天,它会在 3 秒内运行。两天已经很长了。 Sub table 超过 260 万条记录,理想情况下,我需要我的时间表从 2012 年的某个时候开始。

执行此操作最省时的方法是什么?

你走在正确的轨道上。我创建了一些 table 变量并假定每个订阅的数据结构都包含开始和结束日期。

--Create @dates table variable for calendar
DECLARE @startDate DATETIME = '2018-01-01'
DECLARE @endDate DATETIME = '2021-06-18'
DECLARE @dates TABLE
(
reportingdate DATETIME
)
WHILE @startDate <= @endDate
BEGIN
    INSERT INTO @dates SELECT @startDate
    SET @startDate += 1
END

--Create @subscriptions table variable for subcriptions to join onto calendar
DECLARE @subscriptions TABLE
(
id INT
,startDate DATETIME
,endDate DATETIME
)
INSERT INTO @subscriptions
VALUES
(1,'2018-01-01 00:00:00.000','2019-10-07 00:00:00.000')
,(2,'2018-01-11 00:00:00.000','2019-12-21 00:00:00.000')
,(3,'2019-04-21 00:00:00.000','2020-03-19 00:00:00.000')
,(4,'2019-12-09 00:00:00.000','2020-05-14 00:00:00.000')
,(5,'2020-04-26 00:00:00.000','2020-07-06 00:00:00.000')
,(6,'2020-05-02 00:00:00.000',NULL)
,(7,'2020-08-31 00:00:00.000','2020-10-29 00:00:00.000')
,(8,'2020-12-13 00:00:00.000','2021-01-13 00:00:00.000')
,(9,'2021-02-12 00:00:00.000','2021-04-19 00:00:00.000')
,(10,'2021-06-10 00:00:00.000',NULL)
;

然后我在日历上加入订阅table。

--CTE to join subscription onto calendar and use ROW_NUMBER functions
WITH cte AS (
SELECT
    s.id AS SubID
    ,d.ReportingDate
    ,ROW_NUMBER() OVER (PARTITION BY s.id ORDER BY d.ReportingDate) AS asc_rn   --used to identify 1st
    ,ROW_NUMBER() OVER (PARTITION BY s.id ORDER BY d.ReportingDate DESC) AS desc_rn --used to identify last
    ,CASE WHEN s.endDate IS NULL THEN 1 ELSE 0 END AS ActiveSub
FROM @subscriptions s
LEFT JOIN @dates d ON
    d.reportingdate BETWEEN s.startDate AND ISNULL(s.endDate,'9999-12-31')
    )

我使用 ROW_NUMBER 来识别订阅的第一个和最后一个日期行,以及检查订阅结束日期是否为 NULL(仍然有效)。然后我查询 CTE 以计算按天分组的订阅,以及按天分组的新订阅和终止订阅的总和。

--Query CTE using asc_rn, desc_rn, and ActiveSub to identify new subscribers and unsubscribers.
SELECT
    ReportingDate
    ,COUNT(*) AS TotalSubscribers
    ,SUM(CASE WHEN asc_rn = 1 THEN 1 ELSE 0 END) AS NewSubscribers
    ,SUM(CASE WHEN desc_rn = 1 AND ActiveSub = 0 THEN 1 ELSE 0 END) AS UnSubscribers
FROM cte
GROUP BY ReportingDate
ORDER BY ReportingDate