使用满足条件后重置的聚合函数?
Using an aggregate function that resets after condition is met?
我正在处理事件数据,目前正在尝试通过将当前时间戳和之前的时间戳相加来计算在应用程序中花费的时间。但是,问题是每次 'packageName' 列的值发生变化时,我都需要重置此值。我尝试使用以下内容。
SELECT
SUM(timeDifference) OVER(PARTITION BY packageName ORDER BY sNumber, timestamp) as accTime,
*
FROM table.name
ORDER BY
sNumber, timestamp
不过这样的结果似乎太聪明了。我需要它忘记它在每个分区后的聚合,而不是记住早期的结果并累积这些结果。
我的问题是是否有任何方法可以重置它。我将举例说明我得到了什么,以及我想要的输出是什么。任何帮助将不胜感激。
我得到了什么。
**accTime diff packageName**
10 10 com.package.1
20 20 com.package.1
10 10 com.package.2
20 20 com.package.2
30 10 com.package.1
我想要的
**accTime diff packageName**
10 10 com.package.1
20 20 com.package.1
10 10 com.package.2
20 20 com.package.2
10 10 com.package.1
第二个例子显示 "first" 的累计时间被重置,这正是我需要帮助的地方。
为了帮助进一步解释我自己,这里是原始数据的样本:
**timestamp packageName sNumber eventID diff**
1433119125117 com.package.1 xx123xx event1 null
1433119125200 com.package.1 xx123xx event2 83
1433119125400 com.package.2 xx123xx event3 200
1433119125600 com.package.2 xx123xx event4 200
1433119125800 com.package.1 xx123xx event5 200
与此同时正在玩一些样本。这不是一个完整的答案,但可能对某人有所帮助。
select
pos,label,diff,
if (lag!=label or lag is null,1,0) as reset
from(
select
pos,label,diff,
LAG(label, 1) OVER (ORDER BY pos asc) lag,
from (select 10 as diff,'first' as label, 1 as pos),
(select 20 as diff,'first' as label, 2 as pos),
(select 10 as diff,'second' as label, 3 as pos),
(select 20 as diff,'second' as label, 4 as pos),
(select 10 as diff,'first' as label, 5 as pos),
(select 11 as diff,'first' as label, 6 as pos),
(select 12 as diff,'first' as label, 7 as pos),
order by pos
)
这个returns
+-----+-----+--------+------+-------+---+
| Row | pos | label | diff | reset | |
+-----+-----+--------+------+-------+---+
| 1 | 1 | first | 10 | 1 | |
| 2 | 2 | first | 20 | 0 | |
| 3 | 3 | second | 10 | 1 | |
| 4 | 4 | second | 20 | 0 | |
| 5 | 5 | first | 10 | 1 | |
| 6 | 6 | first | 11 | 0 | |
| 7 | 7 | first | 12 | 0 | |
+-----+-----+--------+------+-------+---+
使用滞后函数(你会注意到我的答案看起来像奔腾的),我想这就是你想要的...
我不是 100% 确定,因为你的 accTime 似乎从你的 diff 表现得很奇怪......对我来说,accTime 应该是 accTime+diff,不是吗? (如果我错了,请纠正我,现在查询在哪里,很容易调整它:))
SELECT
timestamp,package,sNumber,eventID,diff,
CASE WHEN lagPackage IS NULL then 0
WHEN package != lagPackage THEN diff
ELSE (diff + IF(lagDiff is null, 0,lagDiff)) END AS accTime
FROM (
SELECT
*,
LAG(package,1) OVER (ORDER BY timestamp) AS lagPackage,
LAG(diff,1,0) OVER (ORDER BY timestamp) AS lagDiff
FROM (
SELECT
1433119125117 AS timestamp,
'com.package.1' AS package,
'xxx123xxx' AS sNumber,
'event1' AS eventID,
NULL AS diff),
(
SELECT
1433119125200 AS timestamp,
'com.package.1' AS package,
'xxx123xxx' AS sNumber,
'event2' AS eventID,
83 AS diff),
(
SELECT
1433119125400 AS timestamp,
'com.package.2' AS package,
'xxx123xxx' AS sNumber,
'event3' AS eventID,
200 AS diff),
(
SELECT
1433119125600 AS timestamp,
'com.package.2' AS package,
'xxx123xxx' AS sNumber,
'event4' AS eventID,
200 AS diff),
(
SELECT
1433119125800 AS timestamp,
'com.package.1' AS package,
'xxx123xxx' AS sNumber,
'event5' AS eventID,
200 AS diff),
ORDER BY
timestamp )
根据您提供的样本集,这个 returns :
Row timestamp package sNumber eventID diff accTime
1 1433119125117 com.package.1 xxx123xxx event1 null 0
2 1433119125200 com.package.1 xxx123xxx event2 83 83
3 1433119125400 com.package.2 xxx123xxx event3 200 200
4 1433119125600 com.package.2 xxx123xxx event4 200 400
5 1433119125800 com.package.1 xxx123xxx event5 200 200
我正在处理事件数据,目前正在尝试通过将当前时间戳和之前的时间戳相加来计算在应用程序中花费的时间。但是,问题是每次 'packageName' 列的值发生变化时,我都需要重置此值。我尝试使用以下内容。
SELECT
SUM(timeDifference) OVER(PARTITION BY packageName ORDER BY sNumber, timestamp) as accTime,
*
FROM table.name
ORDER BY
sNumber, timestamp
不过这样的结果似乎太聪明了。我需要它忘记它在每个分区后的聚合,而不是记住早期的结果并累积这些结果。
我的问题是是否有任何方法可以重置它。我将举例说明我得到了什么,以及我想要的输出是什么。任何帮助将不胜感激。
我得到了什么。
**accTime diff packageName**
10 10 com.package.1
20 20 com.package.1
10 10 com.package.2
20 20 com.package.2
30 10 com.package.1
我想要的
**accTime diff packageName**
10 10 com.package.1
20 20 com.package.1
10 10 com.package.2
20 20 com.package.2
10 10 com.package.1
第二个例子显示 "first" 的累计时间被重置,这正是我需要帮助的地方。
为了帮助进一步解释我自己,这里是原始数据的样本:
**timestamp packageName sNumber eventID diff**
1433119125117 com.package.1 xx123xx event1 null
1433119125200 com.package.1 xx123xx event2 83
1433119125400 com.package.2 xx123xx event3 200
1433119125600 com.package.2 xx123xx event4 200
1433119125800 com.package.1 xx123xx event5 200
与此同时正在玩一些样本。这不是一个完整的答案,但可能对某人有所帮助。
select
pos,label,diff,
if (lag!=label or lag is null,1,0) as reset
from(
select
pos,label,diff,
LAG(label, 1) OVER (ORDER BY pos asc) lag,
from (select 10 as diff,'first' as label, 1 as pos),
(select 20 as diff,'first' as label, 2 as pos),
(select 10 as diff,'second' as label, 3 as pos),
(select 20 as diff,'second' as label, 4 as pos),
(select 10 as diff,'first' as label, 5 as pos),
(select 11 as diff,'first' as label, 6 as pos),
(select 12 as diff,'first' as label, 7 as pos),
order by pos
)
这个returns
+-----+-----+--------+------+-------+---+
| Row | pos | label | diff | reset | |
+-----+-----+--------+------+-------+---+
| 1 | 1 | first | 10 | 1 | |
| 2 | 2 | first | 20 | 0 | |
| 3 | 3 | second | 10 | 1 | |
| 4 | 4 | second | 20 | 0 | |
| 5 | 5 | first | 10 | 1 | |
| 6 | 6 | first | 11 | 0 | |
| 7 | 7 | first | 12 | 0 | |
+-----+-----+--------+------+-------+---+
使用滞后函数(你会注意到我的答案看起来像奔腾的),我想这就是你想要的...
我不是 100% 确定,因为你的 accTime 似乎从你的 diff 表现得很奇怪......对我来说,accTime 应该是 accTime+diff,不是吗? (如果我错了,请纠正我,现在查询在哪里,很容易调整它:))
SELECT
timestamp,package,sNumber,eventID,diff,
CASE WHEN lagPackage IS NULL then 0
WHEN package != lagPackage THEN diff
ELSE (diff + IF(lagDiff is null, 0,lagDiff)) END AS accTime
FROM (
SELECT
*,
LAG(package,1) OVER (ORDER BY timestamp) AS lagPackage,
LAG(diff,1,0) OVER (ORDER BY timestamp) AS lagDiff
FROM (
SELECT
1433119125117 AS timestamp,
'com.package.1' AS package,
'xxx123xxx' AS sNumber,
'event1' AS eventID,
NULL AS diff),
(
SELECT
1433119125200 AS timestamp,
'com.package.1' AS package,
'xxx123xxx' AS sNumber,
'event2' AS eventID,
83 AS diff),
(
SELECT
1433119125400 AS timestamp,
'com.package.2' AS package,
'xxx123xxx' AS sNumber,
'event3' AS eventID,
200 AS diff),
(
SELECT
1433119125600 AS timestamp,
'com.package.2' AS package,
'xxx123xxx' AS sNumber,
'event4' AS eventID,
200 AS diff),
(
SELECT
1433119125800 AS timestamp,
'com.package.1' AS package,
'xxx123xxx' AS sNumber,
'event5' AS eventID,
200 AS diff),
ORDER BY
timestamp )
根据您提供的样本集,这个 returns :
Row timestamp package sNumber eventID diff accTime
1 1433119125117 com.package.1 xxx123xxx event1 null 0
2 1433119125200 com.package.1 xxx123xxx event2 83 83
3 1433119125400 com.package.2 xxx123xxx event3 200 200
4 1433119125600 com.package.2 xxx123xxx event4 200 400
5 1433119125800 com.package.1 xxx123xxx event5 200 200