使用 PostgreSQL,我如何统计从每周一开始的前 30 天内打开消息的人数?
Using PostgreSQL, how can I count the amount of individuals that opened a message in the previous 30 days from the Monday of each week?
场景:
我有一个 table、events_table,其中包含由 webhook 根据我发送给用户的消息插入的记录:
"column_name"(类型)
- "time_stamp"(带时区的时间戳)
- "username" (varchar)
- "delivered"(整数)
- "action"(整数)
示例数据:
| time_stamp | username | delivered | action |
|:----------------|:---------|:----------|:-------|
|1349733421.460000| user1 | 1 | null |
|1549345346.460000| user3 | 1 | 1 |
|1524544421.460000| user1 | 1 | 1 |
|1345444421.570000| user7 | 1 | null |
|1756756761.980000| user9 | 1 | null |
|1234343421.460000| user171 | 1 | 1 |
|1843455621.460000| user5 | 1 | 1 |
| ... | ... | ... | ... |
"delivered" 列默认为空,交付 时为 1。 "action" 列默认为 null,当 opened.
时为 1
问题:
使用 PostgreSQL,我如何统计从每周一开始的前 30 天内打开电子邮件的人数?
理想查询结果:
| date | count |
|:----------------|:----------|
| 02/24/2020 | 1,234,123 |
| 02/17/2020 | 234,123 |
| 02/10/2020 | 1,234,123 |
| 02/03/2020 |12,341,213 |
| ... | ... |
我的尝试:
这是我尝试过的范围,它让我计算了上周的情况:
SELECT
date_trunc('week', to_timestamp("time_stamp")) as date,
count("username") as count,
lag(count(1), 1) over (order by "date") as "count_previous_week"
FROM events_table
WHERE "delivered" = 1
and "action" = 1
GROUP BY 1 order by 1 desc
这是我编写此查询的尝试。
首先,我从数据集中获取最低和最高日期。我将 7 天添加到最高日期,以确保包含截至今天的数据。
然后我 运行 generate_series
针对这 2 个值设置 7 天的间隔给我两个点之间的每个星期一(我们不能只依赖你数据中的星期一设置以防我们有空周)
然后,我根据 generate_series
输出简单地子查询和聚合数据。
select
__weeks.week_begins,
(
select
count(distinct "username")
from
events_table
where
to_timestamp("time_stamp")::date between week_begins - '30 days'::interval and week_begins
and "delivered" = 1
and "action" = 1
)
from
(
select
generate_series(_.min_date, _.max_date, '7 days'::interval)::date as week_begins
from
(
select
min(date_trunc('week', to_timestamp("time_stamp"))::date) as min_date
max(date_trunc('week', to_timestamp("time_stamp"))::date) as max_date
from
events_table
where
"delivered" = 1
and "action" = 1
) as _
) as __weeks
order by
__weeks.week_begins
我不是特别喜欢这个查询,因为查询规划器访问同一个 table 两次,但我想不出另一种方式来构造它。
场景:
我有一个 table、events_table,其中包含由 webhook 根据我发送给用户的消息插入的记录:
"column_name"(类型)
- "time_stamp"(带时区的时间戳)
- "username" (varchar)
- "delivered"(整数)
- "action"(整数)
示例数据:
| time_stamp | username | delivered | action |
|:----------------|:---------|:----------|:-------|
|1349733421.460000| user1 | 1 | null |
|1549345346.460000| user3 | 1 | 1 |
|1524544421.460000| user1 | 1 | 1 |
|1345444421.570000| user7 | 1 | null |
|1756756761.980000| user9 | 1 | null |
|1234343421.460000| user171 | 1 | 1 |
|1843455621.460000| user5 | 1 | 1 |
| ... | ... | ... | ... |
"delivered" 列默认为空,交付 时为 1。 "action" 列默认为 null,当 opened.
时为 1问题:
使用 PostgreSQL,我如何统计从每周一开始的前 30 天内打开电子邮件的人数?
理想查询结果:
| date | count |
|:----------------|:----------|
| 02/24/2020 | 1,234,123 |
| 02/17/2020 | 234,123 |
| 02/10/2020 | 1,234,123 |
| 02/03/2020 |12,341,213 |
| ... | ... |
我的尝试:
这是我尝试过的范围,它让我计算了上周的情况:
SELECT
date_trunc('week', to_timestamp("time_stamp")) as date,
count("username") as count,
lag(count(1), 1) over (order by "date") as "count_previous_week"
FROM events_table
WHERE "delivered" = 1
and "action" = 1
GROUP BY 1 order by 1 desc
这是我编写此查询的尝试。
首先,我从数据集中获取最低和最高日期。我将 7 天添加到最高日期,以确保包含截至今天的数据。
然后我 运行 generate_series
针对这 2 个值设置 7 天的间隔给我两个点之间的每个星期一(我们不能只依赖你数据中的星期一设置以防我们有空周)
然后,我根据 generate_series
输出简单地子查询和聚合数据。
select
__weeks.week_begins,
(
select
count(distinct "username")
from
events_table
where
to_timestamp("time_stamp")::date between week_begins - '30 days'::interval and week_begins
and "delivered" = 1
and "action" = 1
)
from
(
select
generate_series(_.min_date, _.max_date, '7 days'::interval)::date as week_begins
from
(
select
min(date_trunc('week', to_timestamp("time_stamp"))::date) as min_date
max(date_trunc('week', to_timestamp("time_stamp"))::date) as max_date
from
events_table
where
"delivered" = 1
and "action" = 1
) as _
) as __weeks
order by
__weeks.week_begins
我不是特别喜欢这个查询,因为查询规划器访问同一个 table 两次,但我想不出另一种方式来构造它。