bigquery,"subtable" 可能吗?
bigquery, is a "subtable" possible?
在使用旧版 sql 的 bigquery 中,我创建了一个可怕的查询,returns 我在 2018 年 2 月 26 日发布的网站每天的访问量显示如下:
Row date name release_date visits_count
1 20180226 a_name 20180226 2179
2 20180227 a_name 20180226 9522
3 20180228 a_name 20180226 1593
4 20180301 a_name 20180226 300
...
我真正想要的是
Row name release count_release count_release+1 count_release_rest
1 a_name 20180226 2179 9522 1893
因此,我想要发布日期的实际访问次数、发布日期后的第二天以及之后的所有次数应该相加。
我是 bigquery 的新手(也是 sql... 的新手)。有没有一种方法可以将我的第一个显示定义为 "subtable" 或类似的东西,以便我可以这样做,或者您会推荐什么方法?
您可以通过多种方式实现此功能。最简单的方法是将日期与案例语句进行比较。
select name, sum(case when date = relese_date then 1 else 0) as release_count,
sum(case when date = DATE_ADD(relese_date,1,"DAY") then 1 else 0) as release_count1
sum(case when date > DATE_ADD(relese_date,1,"DAY") then 1 else 0) as release_count_other
以下适用于 BigQuery 标准 SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT '20180226' date, 'a_name' name, '20180226' release_date, 2179 visits_count UNION ALL
SELECT '20180227', 'a_name', '20180226', 9522 UNION ALL
SELECT '20180228', 'a_name', '20180226', 1593 UNION ALL
SELECT '20180301', 'a_name', '20180226', 300
)
SELECT name, release_date,
SUM(CASE WHEN date = release_date THEN visits_count END) count_release,
SUM(CASE WHEN PARSE_DATE('%Y%m%d', date) = DATE_ADD(PARSE_DATE('%Y%m%d', release_date), INTERVAL 1 DAY) THEN visits_count END) count_release_next_day,
SUM(CASE WHEN PARSE_DATE('%Y%m%d', date) > DATE_ADD(PARSE_DATE('%Y%m%d', release_date), INTERVAL 1 DAY) THEN visits_count END) count_release_rest
FROM `project.dataset.table`
GROUP BY name, release_date
或以上可以"refactored"避免重复PARSE_DATE,这样查询看起来更紧凑,更容易管理
#standardSQL
WITH `project.dataset.table` AS (
SELECT '20180226' date, 'a_name' name, '20180226' release_date, 2179 visits_count UNION ALL
SELECT '20180227', 'a_name', '20180226', 9522 UNION ALL
SELECT '20180228', 'a_name', '20180226', 1593 UNION ALL
SELECT '20180301', 'a_name', '20180226', 300
)
SELECT name, release_date,
SUM(CASE WHEN date = release_date THEN visits_count END) count_release,
SUM(CASE WHEN visit = release_next_day THEN visits_count END) count_release_next_day,
SUM(CASE WHEN visit > release_next_day THEN visits_count END) count_release_rest
FROM `project.dataset.table`,
UNNEST([STRUCT<visit DATE, release_next_day DATE>(
PARSE_DATE('%Y%m%d', date),
DATE_ADD(PARSE_DATE('%Y%m%d', release_date), INTERVAL 1 DAY))]) x
GROUP BY name, release_date
两种情况的结果都是
Row name release_date count_release count_release_next_day count_release_rest
1 a_name 20180226 2179 9522 1893
在使用旧版 sql 的 bigquery 中,我创建了一个可怕的查询,returns 我在 2018 年 2 月 26 日发布的网站每天的访问量显示如下:
Row date name release_date visits_count
1 20180226 a_name 20180226 2179
2 20180227 a_name 20180226 9522
3 20180228 a_name 20180226 1593
4 20180301 a_name 20180226 300
...
我真正想要的是
Row name release count_release count_release+1 count_release_rest
1 a_name 20180226 2179 9522 1893
因此,我想要发布日期的实际访问次数、发布日期后的第二天以及之后的所有次数应该相加。 我是 bigquery 的新手(也是 sql... 的新手)。有没有一种方法可以将我的第一个显示定义为 "subtable" 或类似的东西,以便我可以这样做,或者您会推荐什么方法?
您可以通过多种方式实现此功能。最简单的方法是将日期与案例语句进行比较。
select name, sum(case when date = relese_date then 1 else 0) as release_count,
sum(case when date = DATE_ADD(relese_date,1,"DAY") then 1 else 0) as release_count1
sum(case when date > DATE_ADD(relese_date,1,"DAY") then 1 else 0) as release_count_other
以下适用于 BigQuery 标准 SQL
#standardSQL
WITH `project.dataset.table` AS (
SELECT '20180226' date, 'a_name' name, '20180226' release_date, 2179 visits_count UNION ALL
SELECT '20180227', 'a_name', '20180226', 9522 UNION ALL
SELECT '20180228', 'a_name', '20180226', 1593 UNION ALL
SELECT '20180301', 'a_name', '20180226', 300
)
SELECT name, release_date,
SUM(CASE WHEN date = release_date THEN visits_count END) count_release,
SUM(CASE WHEN PARSE_DATE('%Y%m%d', date) = DATE_ADD(PARSE_DATE('%Y%m%d', release_date), INTERVAL 1 DAY) THEN visits_count END) count_release_next_day,
SUM(CASE WHEN PARSE_DATE('%Y%m%d', date) > DATE_ADD(PARSE_DATE('%Y%m%d', release_date), INTERVAL 1 DAY) THEN visits_count END) count_release_rest
FROM `project.dataset.table`
GROUP BY name, release_date
或以上可以"refactored"避免重复PARSE_DATE,这样查询看起来更紧凑,更容易管理
#standardSQL
WITH `project.dataset.table` AS (
SELECT '20180226' date, 'a_name' name, '20180226' release_date, 2179 visits_count UNION ALL
SELECT '20180227', 'a_name', '20180226', 9522 UNION ALL
SELECT '20180228', 'a_name', '20180226', 1593 UNION ALL
SELECT '20180301', 'a_name', '20180226', 300
)
SELECT name, release_date,
SUM(CASE WHEN date = release_date THEN visits_count END) count_release,
SUM(CASE WHEN visit = release_next_day THEN visits_count END) count_release_next_day,
SUM(CASE WHEN visit > release_next_day THEN visits_count END) count_release_rest
FROM `project.dataset.table`,
UNNEST([STRUCT<visit DATE, release_next_day DATE>(
PARSE_DATE('%Y%m%d', date),
DATE_ADD(PARSE_DATE('%Y%m%d', release_date), INTERVAL 1 DAY))]) x
GROUP BY name, release_date
两种情况的结果都是
Row name release_date count_release count_release_next_day count_release_rest
1 a_name 20180226 2179 9522 1893