使用满足条件后重置的聚合函数?

Using an aggregate function that resets after condition is met?

我正在处理事件数据,目前正在尝试通过将当前时间戳和之前的时间戳相加来计算在应用程序中花费的时间。但是,问题是每次 'packageName' 列的值发生变化时,我都需要重置此值。我尝试使用以下内容。

SELECT    
    SUM(timeDifference) OVER(PARTITION BY packageName ORDER BY sNumber, timestamp) as accTime,
    *
FROM table.name
ORDER BY
    sNumber, timestamp

不过这样的结果似乎太聪明了。我需要它忘记它在每个分区后的聚合,而不是记住早期的​​结果并累积这些结果。

我的问题是是否有任何方法可以重置它。我将举例说明我得到了什么,以及我想要的输出是什么。任何帮助将不胜感激。

我得到了什么。

**accTime      diff         packageName**
10              10          com.package.1
20              20          com.package.1
10              10          com.package.2
20              20          com.package.2
30              10          com.package.1

我想要的

**accTime      diff         packageName**
10              10          com.package.1
20              20          com.package.1
10              10          com.package.2
20              20          com.package.2
10              10          com.package.1

第二个例子显示 "first" 的累计时间被重置,这正是我需要帮助的地方。

为了帮助进一步解释我自己,这里是原始数据的样本:

**timestamp          packageName          sNumber      eventID      diff**
  1433119125117      com.package.1        xx123xx      event1       null
  1433119125200      com.package.1        xx123xx      event2         83
  1433119125400      com.package.2        xx123xx      event3        200
  1433119125600      com.package.2        xx123xx      event4        200
  1433119125800      com.package.1        xx123xx      event5        200

与此同时正在玩一些样本。这不是一个完整的答案,但可能对某人有所帮助。

select 
  pos,label,diff,
  if (lag!=label or lag is null,1,0) as reset
from(
  select 
    pos,label,diff,
    LAG(label, 1) OVER (ORDER BY pos asc) lag,
  from (select 10 as diff,'first' as label, 1 as pos),
    (select 20 as diff,'first' as label, 2 as pos),
    (select 10 as diff,'second' as label, 3 as pos),
    (select 20 as diff,'second' as label, 4 as pos),
    (select 10 as diff,'first' as label, 5 as pos),
    (select 11 as diff,'first' as label, 6 as pos),
    (select 12 as diff,'first' as label, 7 as pos),
  order by pos
)

这个returns

+-----+-----+--------+------+-------+---+
| Row | pos | label  | diff | reset |   |
+-----+-----+--------+------+-------+---+
|   1 |   1 | first  |   10 |     1 |   |
|   2 |   2 | first  |   20 |     0 |   |
|   3 |   3 | second |   10 |     1 |   |
|   4 |   4 | second |   20 |     0 |   |
|   5 |   5 | first  |   10 |     1 |   |
|   6 |   6 | first  |   11 |     0 |   |
|   7 |   7 | first  |   12 |     0 |   |
+-----+-----+--------+------+-------+---+

使用滞后函数(你会注意到我的答案看起来像奔腾的),我想这就是你想要的...

我不是 100% 确定,因为你的 accTime 似乎从你的 diff 表现得很奇怪......对我来说,accTime 应该是 accTime+diff,不是吗? (如果我错了,请纠正我,现在查询在哪里,很容易调整它:))

SELECT
  timestamp,package,sNumber,eventID,diff,
  CASE WHEN lagPackage IS NULL then 0
  WHEN package != lagPackage THEN diff 
  ELSE (diff + IF(lagDiff is null, 0,lagDiff)) END AS accTime
FROM (
  SELECT
    *,
    LAG(package,1) OVER (ORDER BY timestamp) AS lagPackage,
    LAG(diff,1,0) OVER (ORDER BY timestamp) AS lagDiff
  FROM (
    SELECT
      1433119125117 AS timestamp,
      'com.package.1' AS package,
      'xxx123xxx' AS sNumber,
      'event1' AS eventID,
      NULL AS diff),
    (
    SELECT
      1433119125200 AS timestamp,
      'com.package.1' AS package,
      'xxx123xxx' AS sNumber,
      'event2' AS eventID,
      83 AS diff),
    (
    SELECT
      1433119125400 AS timestamp,
      'com.package.2' AS package,
      'xxx123xxx' AS sNumber,
      'event3' AS eventID,
      200 AS diff),
    (
    SELECT
      1433119125600 AS timestamp,
      'com.package.2' AS package,
      'xxx123xxx' AS sNumber,
      'event4' AS eventID,
      200 AS diff),
    (
    SELECT
      1433119125800 AS timestamp,
      'com.package.1' AS package,
      'xxx123xxx' AS sNumber,
      'event5' AS eventID,
      200 AS diff),
  ORDER BY
    timestamp )

根据您提供的样本集,这个 returns :

Row timestamp       package         sNumber  eventID    diff    accTime  
1   1433119125117   com.package.1   xxx123xxx   event1  null    0    
2   1433119125200   com.package.1   xxx123xxx   event2  83      83   
3   1433119125400   com.package.2   xxx123xxx   event3  200     200  
4   1433119125600   com.package.2   xxx123xxx   event4  200     400  
5   1433119125800   com.package.1   xxx123xxx   event5  200     200