如何计算 SQL Google Big Query 中不同字符串值的每周和每月出现次数?
How to calculate weekly and monthly appearances of distinct string values in SQL Google Big Query?
我是 SQL 的新手,我有一个包含日期值和域列的数据集。域列仅包含值 'personal' 和 'business'。我想要完成的是计算每种域类型的每周和每月滚动计数。
我想做的是创建 2 个单独的列 - is_personal 和 is_business - 其中 domain_type 具有适当值的行的值为 1。例如,如果 domain_type 为 'personal',则 is_personal 列中的值为 1。否则,1 将在 is_business 的行中。然后,我要计算滚动总和。
但是,我想知道我是否可以避免创建额外的列并直接从 Google Big Query 中的字符串列执行每周和每月滚动计数。
到目前为止,我尝试的是使用 DATE_TRUNC(CAST(created_at AS date), ISOWEEK)
到 'roll-up' 日期按周“分组依据”日期列。当我在 domain_type 列上尝试任何滚动函数时,我会遇到很多错误。有些与尝试无法被 Google Big Query 识别的函数有关,有些与我正在使用字符串列这一事实有关,等等。
我要实现的最终目标是计算 'business' 和 'personal' 域类型的每周和每月滚动计数。如果我可以提供有帮助的其他信息,请告诉我。谢谢!
当前数据:
Date domain_type
2017-10-02 personal
2017-10-03 business
2017-10-04 personal
2017-10-05 business
2017-10-06 personal
2017-10-07 business
2017-10-08 personal
2017-10-09 business
2017-10-10 personal
2017-10-11 business
2017-10-12 personal
2017-10-13 business
2017-10-14 personal
2017-10-15 business
假设在 2017 年 10 月 2 日这一周,共有 10 位用户使用个人电子邮件地址注册,共有 20 位用户使用公司电子邮件地址注册。在 2017-10-09 的一周内,共有 25 人使用个人邮箱注册,30 人使用企业邮箱注册。因此,对于 2 周,个人域类型的滚动计数为 35,企业域类型的滚动计数为 50。
我想要实现的输出:
Date domain_type rolling_count_for_week
2017-10-02 personal 10
2017-10-02 business 20
2017-10-09 personal 35
2017-10-09 business 50
如果您想要一周内不同值的数量,请使用聚合:
select date_trunc(date, week) as wk, email_type,
count(*) -- or count(distinct email) if they are not already unique
from t
group by wk, email_type
order by 1, 2;
我没有看到任何关于您正在尝试做的事情的“滚动” - 除非,也许,您想要连续两周的计数。如果是这种情况,请使用 window 函数:
select date_trunc(date, week) as wk, email_type,
count(*) as this_week,
sum(count(*)) over (partition by email_type order by date_trunc(date, wk) rows between 1 preceding and current row) as 2_week_count
from t
group by wk, email_type
order by 1, 2;
WITH
weekly AS
(
SELECT
DATE_TRUNC(CAST(created_at AS date), ISOWEEK) AS created_week,
*
FROM
yourData
)
SELECT
created_week,
domain_type,
SUM(COUNT(*)) OVER (PARTITION BY domain_type ORDER BY created_week) AS cumulative_emails
FROM
weekly
GROUP BY
created_week,
domain_type
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT Date, domain_type,
SUM(IF(domain_type = 'personal', personal, business)) AS rolling_count_for_week
FROM (
SELECT Date, type AS domain_type,
SUM(IF(domain_type = 'personal' AND domain_type = type, 1, 0)) OVER(ORDER BY Date) personal,
SUM(IF(domain_type = 'business' AND domain_type = type, 1, 0)) OVER(ORDER BY Date) business
FROM `project.dataset.table`,
UNNEST(['personal', 'business']) type
)
WHERE EXTRACT(DAYOFWEEK FROM Date) = 2
GROUP BY Date, domain_type
如果应用于您问题中的样本数据 - 输出是
Row Date domain_type rolling_count_for_week
1 2017-10-02 personal 1
2 2017-10-02 business 0
3 2017-10-09 personal 4
4 2017-10-09 business 4
What if, for one particular week, there is no data on dow=2 but there is data for the other days?
说得好,我假设每天至少有一个条目:o)
查看下面没有此依赖项的版本
#standardSQL
WITH calendar_type AS (
SELECT Date, type
FROM (
SELECT MIN(Date) min_date, MAX(Date) max_date
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) Date,
UNNEST(['personal', 'business']) type
)
SELECT Date, domain_type,
SUM(IF(domain_type = 'personal', personal, business)) AS rolling_count_for_week
FROM (
SELECT c.Date, type AS domain_type,
SUM(IF(domain_type = 'personal' AND domain_type = type, 1, 0)) OVER(ORDER BY c.Date) personal,
SUM(IF(domain_type = 'business' AND domain_type = type, 1, 0)) OVER(ORDER BY c.Date) business
FROM calendar_type c
LEFT JOIN `project.dataset.table` t
ON c.Date = t.Date AND c.type = t. domain_type
)
WHERE EXTRACT(DAYOFWEEK FROM Date) = 2
GROUP BY Date, domain_type
我是 SQL 的新手,我有一个包含日期值和域列的数据集。域列仅包含值 'personal' 和 'business'。我想要完成的是计算每种域类型的每周和每月滚动计数。
我想做的是创建 2 个单独的列 - is_personal 和 is_business - 其中 domain_type 具有适当值的行的值为 1。例如,如果 domain_type 为 'personal',则 is_personal 列中的值为 1。否则,1 将在 is_business 的行中。然后,我要计算滚动总和。
但是,我想知道我是否可以避免创建额外的列并直接从 Google Big Query 中的字符串列执行每周和每月滚动计数。
到目前为止,我尝试的是使用 DATE_TRUNC(CAST(created_at AS date), ISOWEEK)
到 'roll-up' 日期按周“分组依据”日期列。当我在 domain_type 列上尝试任何滚动函数时,我会遇到很多错误。有些与尝试无法被 Google Big Query 识别的函数有关,有些与我正在使用字符串列这一事实有关,等等。
我要实现的最终目标是计算 'business' 和 'personal' 域类型的每周和每月滚动计数。如果我可以提供有帮助的其他信息,请告诉我。谢谢!
当前数据:
Date domain_type
2017-10-02 personal
2017-10-03 business
2017-10-04 personal
2017-10-05 business
2017-10-06 personal
2017-10-07 business
2017-10-08 personal
2017-10-09 business
2017-10-10 personal
2017-10-11 business
2017-10-12 personal
2017-10-13 business
2017-10-14 personal
2017-10-15 business
假设在 2017 年 10 月 2 日这一周,共有 10 位用户使用个人电子邮件地址注册,共有 20 位用户使用公司电子邮件地址注册。在 2017-10-09 的一周内,共有 25 人使用个人邮箱注册,30 人使用企业邮箱注册。因此,对于 2 周,个人域类型的滚动计数为 35,企业域类型的滚动计数为 50。
我想要实现的输出:
Date domain_type rolling_count_for_week
2017-10-02 personal 10
2017-10-02 business 20
2017-10-09 personal 35
2017-10-09 business 50
如果您想要一周内不同值的数量,请使用聚合:
select date_trunc(date, week) as wk, email_type,
count(*) -- or count(distinct email) if they are not already unique
from t
group by wk, email_type
order by 1, 2;
我没有看到任何关于您正在尝试做的事情的“滚动” - 除非,也许,您想要连续两周的计数。如果是这种情况,请使用 window 函数:
select date_trunc(date, week) as wk, email_type,
count(*) as this_week,
sum(count(*)) over (partition by email_type order by date_trunc(date, wk) rows between 1 preceding and current row) as 2_week_count
from t
group by wk, email_type
order by 1, 2;
WITH
weekly AS
(
SELECT
DATE_TRUNC(CAST(created_at AS date), ISOWEEK) AS created_week,
*
FROM
yourData
)
SELECT
created_week,
domain_type,
SUM(COUNT(*)) OVER (PARTITION BY domain_type ORDER BY created_week) AS cumulative_emails
FROM
weekly
GROUP BY
created_week,
domain_type
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT Date, domain_type,
SUM(IF(domain_type = 'personal', personal, business)) AS rolling_count_for_week
FROM (
SELECT Date, type AS domain_type,
SUM(IF(domain_type = 'personal' AND domain_type = type, 1, 0)) OVER(ORDER BY Date) personal,
SUM(IF(domain_type = 'business' AND domain_type = type, 1, 0)) OVER(ORDER BY Date) business
FROM `project.dataset.table`,
UNNEST(['personal', 'business']) type
)
WHERE EXTRACT(DAYOFWEEK FROM Date) = 2
GROUP BY Date, domain_type
如果应用于您问题中的样本数据 - 输出是
Row Date domain_type rolling_count_for_week
1 2017-10-02 personal 1
2 2017-10-02 business 0
3 2017-10-09 personal 4
4 2017-10-09 business 4
What if, for one particular week, there is no data on dow=2 but there is data for the other days?
说得好,我假设每天至少有一个条目:o)
查看下面没有此依赖项的版本
#standardSQL
WITH calendar_type AS (
SELECT Date, type
FROM (
SELECT MIN(Date) min_date, MAX(Date) max_date
FROM `project.dataset.table`
), UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) Date,
UNNEST(['personal', 'business']) type
)
SELECT Date, domain_type,
SUM(IF(domain_type = 'personal', personal, business)) AS rolling_count_for_week
FROM (
SELECT c.Date, type AS domain_type,
SUM(IF(domain_type = 'personal' AND domain_type = type, 1, 0)) OVER(ORDER BY c.Date) personal,
SUM(IF(domain_type = 'business' AND domain_type = type, 1, 0)) OVER(ORDER BY c.Date) business
FROM calendar_type c
LEFT JOIN `project.dataset.table` t
ON c.Date = t.Date AND c.type = t. domain_type
)
WHERE EXTRACT(DAYOFWEEK FROM Date) = 2
GROUP BY Date, domain_type