从 Bigquery 的历史数据中输出最后一个失败状态实例
Output the last instance of failure status from historical data in Bigquery
我对 BigQuery 仍然摸不着头脑,希望你能帮助我。
我有一个过去2个月不同用户故障状态的数据集,每天扫描一次。
基本上,我的数据集是这样的:
WITH failure_table AS
(SELECT 'Andrea' AS name, 'Failure' AS status, '2022-04-28 4:00:00' AS timestamp
UNION ALL SELECT 'Karl', 'Failure', '2022-04-28 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-04-27 4:00:00'
UNION ALL SELECT 'Karl', 'Failure', '2022-04-27 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-04-26 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-04-25 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-03-30 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-03-29 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-03-28 4:00:00'
UNION ALL SELECT 'Karl', 'Failure', '2022-03-28 4:00:00')
我想输出的是用户第一次犯错误的时间戳,每天连续犯错误状态,一直到今天(2022-04-29)。
所以在这种情况下,Andrea 和 Karl 从 3 月开始的失败将不被考虑分析,因为 3 月 30 日之后,他们有成功标记,但直到 4 月下旬才再次失败。 (我不确定我是否有道理,请告诉我)。
所以我的预期输出是,
name
status
started failing timestamp
days failing
Andrea
Failure
2022-04-25 4:00:00
4
Karl
Failure
2022-04-27 4:00:00
2
希望有人能帮忙。谢谢!
使用以下方法
select name, status,
min(timestamp) as started_failing_timestamp,
date_diff(max(date(timestamp)), min(date(timestamp)), day) + 1 as days_failing
from (
select * except(flag), flag, countif(flag > 1) over(partition by name order by timestamp) grp
from (
select *, date_diff(date(timestamp), lag(date(timestamp)) over(partition by name order by timestamp), day) flag
from failure_table
where status = 'Failure'
)
)
group by name, status, grp
qualify 1 = row_number() over(partition by name order by grp desc)
如果应用于您问题中的示例数据 - 输出为
我对 BigQuery 仍然摸不着头脑,希望你能帮助我。
我有一个过去2个月不同用户故障状态的数据集,每天扫描一次。
基本上,我的数据集是这样的:
WITH failure_table AS
(SELECT 'Andrea' AS name, 'Failure' AS status, '2022-04-28 4:00:00' AS timestamp
UNION ALL SELECT 'Karl', 'Failure', '2022-04-28 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-04-27 4:00:00'
UNION ALL SELECT 'Karl', 'Failure', '2022-04-27 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-04-26 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-04-25 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-03-30 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-03-29 4:00:00'
UNION ALL SELECT 'Andrea', 'Failure', '2022-03-28 4:00:00'
UNION ALL SELECT 'Karl', 'Failure', '2022-03-28 4:00:00')
我想输出的是用户第一次犯错误的时间戳,每天连续犯错误状态,一直到今天(2022-04-29)。
所以在这种情况下,Andrea 和 Karl 从 3 月开始的失败将不被考虑分析,因为 3 月 30 日之后,他们有成功标记,但直到 4 月下旬才再次失败。 (我不确定我是否有道理,请告诉我)。
所以我的预期输出是,
name | status | started failing timestamp | days failing |
---|---|---|---|
Andrea | Failure | 2022-04-25 4:00:00 | 4 |
Karl | Failure | 2022-04-27 4:00:00 | 2 |
希望有人能帮忙。谢谢!
使用以下方法
select name, status,
min(timestamp) as started_failing_timestamp,
date_diff(max(date(timestamp)), min(date(timestamp)), day) + 1 as days_failing
from (
select * except(flag), flag, countif(flag > 1) over(partition by name order by timestamp) grp
from (
select *, date_diff(date(timestamp), lag(date(timestamp)) over(partition by name order by timestamp), day) flag
from failure_table
where status = 'Failure'
)
)
group by name, status, grp
qualify 1 = row_number() over(partition by name order by grp desc)
如果应用于您问题中的示例数据 - 输出为