BigQuery - 折叠具有连续日期的行
BigQuery - Collapse rows with contiguous dates
我有一个包含销售目标的 table。它们通常 每月设置 ,但每天和市场加载到 table 一行。例如,如果英国 1 月的目标是 1550,它将加载为 31 行(1 月每天一行),每行的目标为 50(1550 / 31 天)。
WITH targets AS (
SELECT DATE '2018-01-01' AS date, 'uk' AS market, NUMERIC '50' AS target
UNION ALL SELECT '2018-01-02', "uk", 50
UNION ALL SELECT '2018-01-03', "uk", 50
# ...
UNION ALL SELECT '2018-01-31', "uk", 50
UNION ALL SELECT '2018-02-01', "uk", 25
UNION ALL SELECT '2018-02-02', "uk", 25
# ...
UNION ALL SELECT '2018-02-27', "uk", 25
UNION ALL SELECT '2018-02-28', "uk", 25
UNION ALL SELECT '2018-03-01', "uk", 50
UNION ALL SELECT '2018-03-02', "uk", 50
UNION ALL SELECT '2018-03-03', "uk", 50
# ...
UNION ALL SELECT '2018-03-31', "uk", 50
)
我想将其折叠起来,以便每一行都有一个 dateFrom
和 dateTo
列,以减少加载数据的工作量和查询它的 time/cost 。
我通过对市场和目标进行分组并汇总最大和最小日期以及目标总和来完成此操作:
SELECT
MIN(date) AS dateFrom,
MAX(date) AS dateTo,
Market,
target AS dailyTarget,
SUM(target) AS target
FROM targets
GROUP BY Market, dailyTarget
ORDER BY dateFrom
我希望有三行,但有一个问题 - 如果将具有相同市场和目标的月份按具有另一个目标的月份分开,我们会得到重叠的行。在上面的示例中,一月和三月的每日目标都是 50,但二月的目标是 25。
我认为解决方案在于使用窗口仅将日期与前一行的日期相邻的行组合在一起 - 但我不知道如何实现它!
感谢您的帮助!
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT
MIN(date) AS dateFrom,
MAX(date) AS dateTo,
Market,
target AS dailyTarget,
SUM(target) AS target
FROM `project.dataset.targets`
GROUP BY Market, dailyTarget, DATE_TRUNC(date, MONTH)
ORDER BY dateFrom
如您所见 - 您只需将 DATE_TRUNC(date, MONTH)
添加到 GROUP BY
这是一个缺口和孤岛问题。您可以使用以下方法获取范围:
select market, min(date), max(date), target
from (select t.*,
row_number() over (partition by market, target order by date) as seqnum_t,
row_number() OVER (partition by market order by date) as seqnum
from targets t
) t
group by market, target, (seqnum - seqnum_t)
我有一个包含销售目标的 table。它们通常 每月设置 ,但每天和市场加载到 table 一行。例如,如果英国 1 月的目标是 1550,它将加载为 31 行(1 月每天一行),每行的目标为 50(1550 / 31 天)。
WITH targets AS (
SELECT DATE '2018-01-01' AS date, 'uk' AS market, NUMERIC '50' AS target
UNION ALL SELECT '2018-01-02', "uk", 50
UNION ALL SELECT '2018-01-03', "uk", 50
# ...
UNION ALL SELECT '2018-01-31', "uk", 50
UNION ALL SELECT '2018-02-01', "uk", 25
UNION ALL SELECT '2018-02-02', "uk", 25
# ...
UNION ALL SELECT '2018-02-27', "uk", 25
UNION ALL SELECT '2018-02-28', "uk", 25
UNION ALL SELECT '2018-03-01', "uk", 50
UNION ALL SELECT '2018-03-02', "uk", 50
UNION ALL SELECT '2018-03-03', "uk", 50
# ...
UNION ALL SELECT '2018-03-31', "uk", 50
)
我想将其折叠起来,以便每一行都有一个 dateFrom
和 dateTo
列,以减少加载数据的工作量和查询它的 time/cost 。
我通过对市场和目标进行分组并汇总最大和最小日期以及目标总和来完成此操作:
SELECT
MIN(date) AS dateFrom,
MAX(date) AS dateTo,
Market,
target AS dailyTarget,
SUM(target) AS target
FROM targets
GROUP BY Market, dailyTarget
ORDER BY dateFrom
我希望有三行,但有一个问题 - 如果将具有相同市场和目标的月份按具有另一个目标的月份分开,我们会得到重叠的行。在上面的示例中,一月和三月的每日目标都是 50,但二月的目标是 25。
我认为解决方案在于使用窗口仅将日期与前一行的日期相邻的行组合在一起 - 但我不知道如何实现它!
感谢您的帮助!
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT
MIN(date) AS dateFrom,
MAX(date) AS dateTo,
Market,
target AS dailyTarget,
SUM(target) AS target
FROM `project.dataset.targets`
GROUP BY Market, dailyTarget, DATE_TRUNC(date, MONTH)
ORDER BY dateFrom
如您所见 - 您只需将 DATE_TRUNC(date, MONTH)
添加到 GROUP BY
这是一个缺口和孤岛问题。您可以使用以下方法获取范围:
select market, min(date), max(date), target
from (select t.*,
row_number() over (partition by market, target order by date) as seqnum_t,
row_number() OVER (partition by market order by date) as seqnum
from targets t
) t
group by market, target, (seqnum - seqnum_t)