从 Bigquery 的历史数据中输出最后一个失败状态实例

Output the last instance of failure status from historical data in Bigquery

我对 BigQuery 仍然摸不着头脑,希望你能帮助我。

我有一个过去2个月不同用户故障状态的数据集,每天扫描一次。

基本上,我的数据集是这样的:

WITH failure_table AS
  (SELECT 'Andrea' AS name, 'Failure' AS status, '2022-04-28 4:00:00' AS timestamp
   UNION ALL SELECT 'Karl', 'Failure', '2022-04-28 4:00:00'
   UNION ALL SELECT 'Andrea', 'Failure', '2022-04-27 4:00:00'
   UNION ALL SELECT 'Karl', 'Failure', '2022-04-27 4:00:00'
   UNION ALL SELECT 'Andrea', 'Failure', '2022-04-26 4:00:00'
   UNION ALL SELECT 'Andrea', 'Failure', '2022-04-25 4:00:00'
   UNION ALL SELECT 'Andrea', 'Failure', '2022-03-30 4:00:00'
   UNION ALL SELECT 'Andrea', 'Failure', '2022-03-29 4:00:00'
   UNION ALL SELECT 'Andrea', 'Failure', '2022-03-28 4:00:00'
   UNION ALL SELECT 'Karl', 'Failure', '2022-03-28 4:00:00')

我想输出的是用户第一次犯错误的时间戳,每天连续犯错误状态,一直到今天(2022-04-29)。

所以在这种情况下,Andrea 和 Karl 从 3 月开始的失败将不被考虑分析,因为 3 月 30 日之后,他们有成功标记,但直到 4 月下旬才再次失败。 (我不确定我是否有道理,请告诉我)。

所以我的预期输出是,

name status started failing timestamp days failing
Andrea Failure 2022-04-25 4:00:00 4
Karl Failure 2022-04-27 4:00:00 2

希望有人能帮忙。谢谢!

使用以下方法

select name, status,  
  min(timestamp) as started_failing_timestamp,
  date_diff(max(date(timestamp)), min(date(timestamp)), day) + 1 as days_failing
from (
  select * except(flag), flag, countif(flag > 1) over(partition by name order by timestamp) grp
  from (
    select *, date_diff(date(timestamp), lag(date(timestamp)) over(partition by name order by timestamp), day) flag
    from failure_table
    where status = 'Failure'
  )
)
group by name, status, grp
qualify 1 = row_number() over(partition by name order by grp desc)            

如果应用于您问题中的示例数据 - 输出为