每日 bigquery 聚合
bigquery aggregate for daily basis
我在 big-query(数据仓库)中有一个 table:
我想要的结果是:
这里是关于如何计算的解释:
- 2017-10-01 = $100 很明显,因为数据只有一个
- 2017-10-02 = $400 是第一行和第三行的总和。为什么?因为第二行和第三行有相同的发票。所以我们只使用最新的更新。
- 2017-10-04 = $800 是第 1,3 行和第 4 行的总和。为什么?这是因为我们每天只收一张发票。第 1 行 (T001)、第 3 行 (T002)、第 4 行 (T003)
- 2017-10-05 = 100 美元是第 1,5 行和第 6 行的总和。为什么?这是因为我们每天只收一张发票。第 1 行 (T001)、第 5 行 (T002)、第 6 行 (T003)
老实说,我完全不知道该怎么做。我已经尝试过多次分组等等。但是 none 它们按预期工作。这是我今天迄今为止的最新成果:
SELECT
amount,
updatedDateOnly,
invNo
FROM
(
SELECT
invNo,
UpdatedDate,
amount,
DATE(updatedDate) as updatedDateOnly,
row_number() OVER (PARTITION BY invNo ORDER BY UpdatedDate DESC) AS rownum
FROM [project:dataset.test]
)
WHERE
rownum = 1
只有 returns 最后一个日期。现在,我不知道如何每天查询。
感谢任何专家并愿意帮助查询的人。谢谢。
更新:
json 中的数据,如果您想在 bigquery 或其他 SQL 服务器中尝试:
{"UpdatedDate":"2017-10-01 01:00:00","InvNo":"T001","amount":100}
{"UpdatedDate":"2017-10-02 01:00:00","InvNo":"T002","amount":200}
{"UpdatedDate":"2017-10-02 02:00:00","InvNo":"T002","amount":300}
{"UpdatedDate":"2017-10-04 01:00:00","InvNo":"T003","amount":400}
{"UpdatedDate":"2017-10-05 01:00:00","InvNo":"T002","amount":500}
{"UpdatedDate":"2017-10-05 02:00:00","InvNo":"T003","amount":500}
在每个日期,您需要每张发票的最新金额。那是比较复杂的。一种解决方案是从日期和记录的交叉连接开始。然后 window 函数可用于获取最近的金额:
select dte,
sum(case when seqnum = 1 then amount else 0 end) as amount
from (select d.dte, t.*,
row_number() over (partition by t.invNo order by t.UpdatedDate desc) as seqnum
from (select distinct date(UpdatedDate) as dte
from `project.dataset.test` t
) d join
`project.dataset.test` t
on date(t.UpdatedDate) <= d.dte
) td
group by dte;
以下适用于 BigQuery 标准 SQL
#standardSQL
WITH dates AS (
SELECT DISTINCT DATE(UpdatedDate) UpdatedDay
FROM `project.dataset.test`
),
qualified AS (
SELECT DATE(UpdatedDate) UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY UpdatedDate DESC LIMIT 1)[SAFE_OFFSET(0)] amount
FROM `project.dataset.test`
GROUP BY UpdatedDay, InvNo
)
SELECT UpdatedDay, SUM(amount) amount
FROM (
SELECT d.UpdatedDay UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY q.UpdatedDay DESC LIMIT 1)[SAFE_OFFSET(0)] amount
FROM dates d
JOIN qualified q
ON q.UpdatedDay <= d.UpdatedDay
GROUP BY UpdatedDay, InvNo
)
GROUP BY UpdatedDay
-- ORDER BY UpdatedDay
您可以使用您问题中的以下虚拟数据来测试/玩这个
#standardSQL
WITH `project.dataset.test` AS (
SELECT TIMESTAMP '2017-10-01 01:00:00' UpdatedDate, 'T001' InvNo, 100 amount UNION ALL
SELECT TIMESTAMP '2017-10-02 01:00:00', 'T002', 200 UNION ALL
SELECT TIMESTAMP '2017-10-02 02:00:00', 'T002', 300 UNION ALL
SELECT TIMESTAMP '2017-10-04 01:00:00', 'T003', 400 UNION ALL
SELECT TIMESTAMP '2017-10-05 01:00:00', 'T002', 500 UNION ALL
SELECT TIMESTAMP '2017-10-05 02:00:00', 'T003', 500
),
dates AS (
SELECT DISTINCT DATE(UpdatedDate) UpdatedDay
FROM `project.dataset.test`
),
qualified AS (
SELECT DATE(UpdatedDate) UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY UpdatedDate DESC LIMIT 1)[SAFE_OFFSET(0)] amount
FROM `project.dataset.test`
GROUP BY UpdatedDay, InvNo
)
SELECT UpdatedDay, SUM(amount) amount
FROM (
SELECT d.UpdatedDay UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY q.UpdatedDay DESC LIMIT 1)[SAFE_OFFSET(0)] amount
FROM dates d
JOIN qualified q
ON q.UpdatedDay <= d.UpdatedDay
GROUP BY UpdatedDay, InvNo
)
GROUP BY UpdatedDay
ORDER BY UpdatedDay
结果符合预期
UpdatedDay amount
2017-10-01 100
2017-10-02 400
2017-10-04 800
2017-10-05 1100
我在 big-query(数据仓库)中有一个 table:
我想要的结果是:
这里是关于如何计算的解释:
- 2017-10-01 = $100 很明显,因为数据只有一个
- 2017-10-02 = $400 是第一行和第三行的总和。为什么?因为第二行和第三行有相同的发票。所以我们只使用最新的更新。
- 2017-10-04 = $800 是第 1,3 行和第 4 行的总和。为什么?这是因为我们每天只收一张发票。第 1 行 (T001)、第 3 行 (T002)、第 4 行 (T003)
- 2017-10-05 = 100 美元是第 1,5 行和第 6 行的总和。为什么?这是因为我们每天只收一张发票。第 1 行 (T001)、第 5 行 (T002)、第 6 行 (T003)
老实说,我完全不知道该怎么做。我已经尝试过多次分组等等。但是 none 它们按预期工作。这是我今天迄今为止的最新成果:
SELECT
amount,
updatedDateOnly,
invNo
FROM
(
SELECT
invNo,
UpdatedDate,
amount,
DATE(updatedDate) as updatedDateOnly,
row_number() OVER (PARTITION BY invNo ORDER BY UpdatedDate DESC) AS rownum
FROM [project:dataset.test]
)
WHERE
rownum = 1
只有 returns 最后一个日期。现在,我不知道如何每天查询。
感谢任何专家并愿意帮助查询的人。谢谢。
更新: json 中的数据,如果您想在 bigquery 或其他 SQL 服务器中尝试:
{"UpdatedDate":"2017-10-01 01:00:00","InvNo":"T001","amount":100}
{"UpdatedDate":"2017-10-02 01:00:00","InvNo":"T002","amount":200}
{"UpdatedDate":"2017-10-02 02:00:00","InvNo":"T002","amount":300}
{"UpdatedDate":"2017-10-04 01:00:00","InvNo":"T003","amount":400}
{"UpdatedDate":"2017-10-05 01:00:00","InvNo":"T002","amount":500}
{"UpdatedDate":"2017-10-05 02:00:00","InvNo":"T003","amount":500}
在每个日期,您需要每张发票的最新金额。那是比较复杂的。一种解决方案是从日期和记录的交叉连接开始。然后 window 函数可用于获取最近的金额:
select dte,
sum(case when seqnum = 1 then amount else 0 end) as amount
from (select d.dte, t.*,
row_number() over (partition by t.invNo order by t.UpdatedDate desc) as seqnum
from (select distinct date(UpdatedDate) as dte
from `project.dataset.test` t
) d join
`project.dataset.test` t
on date(t.UpdatedDate) <= d.dte
) td
group by dte;
以下适用于 BigQuery 标准 SQL
#standardSQL
WITH dates AS (
SELECT DISTINCT DATE(UpdatedDate) UpdatedDay
FROM `project.dataset.test`
),
qualified AS (
SELECT DATE(UpdatedDate) UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY UpdatedDate DESC LIMIT 1)[SAFE_OFFSET(0)] amount
FROM `project.dataset.test`
GROUP BY UpdatedDay, InvNo
)
SELECT UpdatedDay, SUM(amount) amount
FROM (
SELECT d.UpdatedDay UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY q.UpdatedDay DESC LIMIT 1)[SAFE_OFFSET(0)] amount
FROM dates d
JOIN qualified q
ON q.UpdatedDay <= d.UpdatedDay
GROUP BY UpdatedDay, InvNo
)
GROUP BY UpdatedDay
-- ORDER BY UpdatedDay
您可以使用您问题中的以下虚拟数据来测试/玩这个
#standardSQL
WITH `project.dataset.test` AS (
SELECT TIMESTAMP '2017-10-01 01:00:00' UpdatedDate, 'T001' InvNo, 100 amount UNION ALL
SELECT TIMESTAMP '2017-10-02 01:00:00', 'T002', 200 UNION ALL
SELECT TIMESTAMP '2017-10-02 02:00:00', 'T002', 300 UNION ALL
SELECT TIMESTAMP '2017-10-04 01:00:00', 'T003', 400 UNION ALL
SELECT TIMESTAMP '2017-10-05 01:00:00', 'T002', 500 UNION ALL
SELECT TIMESTAMP '2017-10-05 02:00:00', 'T003', 500
),
dates AS (
SELECT DISTINCT DATE(UpdatedDate) UpdatedDay
FROM `project.dataset.test`
),
qualified AS (
SELECT DATE(UpdatedDate) UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY UpdatedDate DESC LIMIT 1)[SAFE_OFFSET(0)] amount
FROM `project.dataset.test`
GROUP BY UpdatedDay, InvNo
)
SELECT UpdatedDay, SUM(amount) amount
FROM (
SELECT d.UpdatedDay UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY q.UpdatedDay DESC LIMIT 1)[SAFE_OFFSET(0)] amount
FROM dates d
JOIN qualified q
ON q.UpdatedDay <= d.UpdatedDay
GROUP BY UpdatedDay, InvNo
)
GROUP BY UpdatedDay
ORDER BY UpdatedDay
结果符合预期
UpdatedDay amount
2017-10-01 100
2017-10-02 400
2017-10-04 800
2017-10-05 1100