Isolate/Sum 数据有条件地基于事件序列

Isolate/Sum Data Conditionally Based On Event Sequence

我有一个 table 可以按顺序跟踪事件并计算每个事件的时间。我想使用下面的示例数据做的是计算在第二个事件 C 发生 之前发生的所有事件 A 的总时间流逝(秒)数。所以在下面的例子中,我想要 550 秒的输出。

显然,

select sum(timeelapse_seconds) where eventtype = "A"

returns 750 秒的结果,因为它包含事件 # 6。

Event EventType TimeElapse_Seconds
----------------------------------    
  1        C          50
  1        A         100
  2        A         100
  3        B         200
  4        A         350
  5        C         100
  6        A         200

谢谢!

更新

抱歉,我刚刚意识到我的数据集。将有一个 EventType C 的初始 Event 1。因此,我需要找到第二个实例而不是第一个实例(因此 min 不起作用)。我已经更新了样本 table.

一种方法是使用window函数:

select sum(TimeElapse_Seconds)
from (select t.*,
             min(case when eventtype = 'C' then event end) over (order by event) as min_c_event
      from t
     ) t
where event < min_c_event and event = 'A';

假设 Event 确定顺序以确定 before 的含义...

SELECT sum(TimeElapse_Seconds)
FROM events
WHERE EventType = 'A' AND Event < (SELECT min(Event) FROM events WHERE EventType = 'C');

是一种方式。为了获得最佳结果,您需要在 (EventType, Event)

上建立索引

以下适用于 BigQuery 标准 SQL

#standardSQL
SELECT SUM(TimeElapse_Seconds) TotalElapse_Seconds
FROM (
  SELECT EventType, TimeElapse_Seconds, 
    COUNTIF(EventType = 'C') OVER(ORDER BY Event) = 1 BeforeC
  FROM `project.dataset.table`
)
WHERE EventType = 'A' AND BeforeC 

如果应用到您的问题中的示例数据 - 结果是

Row TotalElapse_Seconds  
1   550  

嗯,首先你必须找到事件 C 第二次出现的位置,然后将索引次于该位置的所有 A 事件的 TimeElapse_Seconds 的值相加。所以:

SELECT SUM(TimeElapse_Seconds)
FROM events
WHERE EventType = 'A' 
AND Event < (SELECT MIN(Event) -- Second appearance of event C
             FROM events
             WHERE EventType = 'C' 
               AND Event > (SELECT MIN(Event) -- First appearance of event C
                            FROM events
                            WHERE EventType = 'C'))