在 SQL 中的日期之间每天计算用户留存率
Calculating user retention on daily basis between the dates in SQL
我有一个 table,其中包含有关 user_ids 的数据,他们所有最后 log_in 日期到应用程序
Table:
|----------|--------------|
| User_Id | log_in_dates |
|----------|--------------|
| 1 | 2021-09-01 |
| 1 | 2021-09-03 |
| 2 | 2021-09-02 |
| 2 | 2021-09-04 |
| 3 | 2021-09-01 |
| 3 | 2021-09-02 |
| 3 | 2021-09-03 |
| 3 | 2021-09-04 |
| 4 | 2021-09-03 |
| 4 | 2021-09-04 |
| 5 | 2021-09-01 |
| 6 | 2021-09-01 |
| 6 | 2021-09-09 |
|----------|--------------|
从上面table,我试图了解用户从今天到过去 90 天的登录行为。
Num_users_no_log_in
定义从 present_day
到前几天(last_log_in_date
)
未登录应用程序的用户数
我想要 table 如下所示:
|---------------|------------------|--------------------|-------------------------|
| present_date | days_difference | last_log_in_date | Num_users_no_log_in |
|---------------|------------------|--------------------|-------------------------|
| 2021-09-01 | 0 | 2021-09-01 | 0 |
| 2021-09-02 | 1 | 2021-09-01 | 3 |->(Id = 1,5,6)
| 2021-09-02 | 0 | 2021-09-02 | 3 |->(Id = 1,5,6)
| 2021-09-03 | 2 | 2021-09-01 | 2 |->(Id = 5,6)
| 2021-09-03 | 1 | 2021-09-02 | 1 |->(Id = 2)
| 2021-09-03 | 0 | 2021-09-03 | 3 |->(Id = 2,5,6)
| 2021-09-04 | 3 | 2021-09-01 | 2 |->(Id = 5,6)
| 2021-09-04 | 2 | 2021-09-02 | 0 |
| 2021-09-04 | 1 | 2021-09-03 | 1 |->(Id= 1)
| 2021-09-04 | 0 | 2021-09-04 | 3 |->(Id = 1,5,6)
| .... | .... | .... | ....
|---------------|------------------|--------------------|-------------------------|
我能够使用以下查询获得前三列 Present_date | days_difference | last_log_in_date
:
with dts as
(
select distinct log_in from users_table
)
select x.log_in_dates as present_date,
DATEDIFF(DAY, y.log_in_dates ,x.log_in_dates ) as Days_since_last_log_in,
y.log_in_dates as log_in_dates
from dts x, dts y
where x.log_in_dates >= y.log_in_dates
我不明白我怎么能得到第四列Num_users_no_log_in
我不太明白你的需求:是否有基于用户或日期的值?它是基于日期的,看起来像(在其他地方您可能会将 user_id 作为第一列),多次出现相同的日期是什么意思?我知道你想回顾一下从开始到当前日期的所有日期,但在我看来并没有真正意义(想象一下你的仪表板在 1 年内!!)
说到这里,进入方法
在这种情况下,我使用 common table extensions 逐步开发。例如,它需要 3 个步骤:
- 准备时间序列
- 整合连接的日期并执行第一次计算(时差)
- 最后计算每天的nb连接
然后,最终查询将显示所需的结果。
这是我提出的查询,是用 Postgresql 开发的(你没有精确你的 dbms,但在这里转换应该不是什么大问题):
with init_calendar as (
-- Prepare date series and count total users
select generate_series(min(log_in_dates), now(), interval '1 day') as present_date,
count(distinct user_id) as nb_users
from users
),
calendar as (
-- Add connections' dates for each period from the beginning to current date in calendar
-- and calculate nb days difference for each of them
-- Syntax my vary depending dbms used
select distinct present_date, log_in_dates as last_date,
extract(day from present_date - log_in_dates) as days_difference,
nb_users
from init_calendar
join users on log_in_dates <= present_date
),
usr_con as (
-- Identify last user connection's dates according to running date
-- Tag the line to be counted as no connection
select c.present_date, c.last_date, c.days_difference, c.nb_users,
u.user_id, max(log_in_dates) as last_con,
case when max(log_in_dates) = present_date then 0 else 1 end as to_count
from calendar c
join users u on u.log_in_dates <= c.last_date
group by c.present_date, c.last_date, c.days_difference, c.nb_users, u.user_id
)
select present_date, last_date, days_difference,
nb_users - sum(to_count) as Num_users_no_log_in
from usr_con
group by present_date, last_date, days_difference, nb_users
order by present_date, last_date
请注意,您在计算中忘记了 user_id = 3,这与您自己的预期结果存在差异。
如果你想玩查询,你可以使用 dbfiddle
我有一个 table,其中包含有关 user_ids 的数据,他们所有最后 log_in 日期到应用程序
Table:
|----------|--------------|
| User_Id | log_in_dates |
|----------|--------------|
| 1 | 2021-09-01 |
| 1 | 2021-09-03 |
| 2 | 2021-09-02 |
| 2 | 2021-09-04 |
| 3 | 2021-09-01 |
| 3 | 2021-09-02 |
| 3 | 2021-09-03 |
| 3 | 2021-09-04 |
| 4 | 2021-09-03 |
| 4 | 2021-09-04 |
| 5 | 2021-09-01 |
| 6 | 2021-09-01 |
| 6 | 2021-09-09 |
|----------|--------------|
从上面table,我试图了解用户从今天到过去 90 天的登录行为。
Num_users_no_log_in
定义从 present_day
到前几天(last_log_in_date
)
我想要 table 如下所示:
|---------------|------------------|--------------------|-------------------------|
| present_date | days_difference | last_log_in_date | Num_users_no_log_in |
|---------------|------------------|--------------------|-------------------------|
| 2021-09-01 | 0 | 2021-09-01 | 0 |
| 2021-09-02 | 1 | 2021-09-01 | 3 |->(Id = 1,5,6)
| 2021-09-02 | 0 | 2021-09-02 | 3 |->(Id = 1,5,6)
| 2021-09-03 | 2 | 2021-09-01 | 2 |->(Id = 5,6)
| 2021-09-03 | 1 | 2021-09-02 | 1 |->(Id = 2)
| 2021-09-03 | 0 | 2021-09-03 | 3 |->(Id = 2,5,6)
| 2021-09-04 | 3 | 2021-09-01 | 2 |->(Id = 5,6)
| 2021-09-04 | 2 | 2021-09-02 | 0 |
| 2021-09-04 | 1 | 2021-09-03 | 1 |->(Id= 1)
| 2021-09-04 | 0 | 2021-09-04 | 3 |->(Id = 1,5,6)
| .... | .... | .... | ....
|---------------|------------------|--------------------|-------------------------|
我能够使用以下查询获得前三列 Present_date | days_difference | last_log_in_date
:
with dts as
(
select distinct log_in from users_table
)
select x.log_in_dates as present_date,
DATEDIFF(DAY, y.log_in_dates ,x.log_in_dates ) as Days_since_last_log_in,
y.log_in_dates as log_in_dates
from dts x, dts y
where x.log_in_dates >= y.log_in_dates
我不明白我怎么能得到第四列Num_users_no_log_in
我不太明白你的需求:是否有基于用户或日期的值?它是基于日期的,看起来像(在其他地方您可能会将 user_id 作为第一列),多次出现相同的日期是什么意思?我知道你想回顾一下从开始到当前日期的所有日期,但在我看来并没有真正意义(想象一下你的仪表板在 1 年内!!)
说到这里,进入方法
在这种情况下,我使用 common table extensions 逐步开发。例如,它需要 3 个步骤:
- 准备时间序列
- 整合连接的日期并执行第一次计算(时差)
- 最后计算每天的nb连接
然后,最终查询将显示所需的结果。
这是我提出的查询,是用 Postgresql 开发的(你没有精确你的 dbms,但在这里转换应该不是什么大问题):
with init_calendar as (
-- Prepare date series and count total users
select generate_series(min(log_in_dates), now(), interval '1 day') as present_date,
count(distinct user_id) as nb_users
from users
),
calendar as (
-- Add connections' dates for each period from the beginning to current date in calendar
-- and calculate nb days difference for each of them
-- Syntax my vary depending dbms used
select distinct present_date, log_in_dates as last_date,
extract(day from present_date - log_in_dates) as days_difference,
nb_users
from init_calendar
join users on log_in_dates <= present_date
),
usr_con as (
-- Identify last user connection's dates according to running date
-- Tag the line to be counted as no connection
select c.present_date, c.last_date, c.days_difference, c.nb_users,
u.user_id, max(log_in_dates) as last_con,
case when max(log_in_dates) = present_date then 0 else 1 end as to_count
from calendar c
join users u on u.log_in_dates <= c.last_date
group by c.present_date, c.last_date, c.days_difference, c.nb_users, u.user_id
)
select present_date, last_date, days_difference,
nb_users - sum(to_count) as Num_users_no_log_in
from usr_con
group by present_date, last_date, days_difference, nb_users
order by present_date, last_date
请注意,您在计算中忘记了 user_id = 3,这与您自己的预期结果存在差异。 如果你想玩查询,你可以使用 dbfiddle