我想累计计算以前重复值的数量
I want to count cumulatively the number of previous repeating values
输入数据框如下所示:
ID
Timestamp
Diff
1
2021-01-01 10:00:00
6
1
2021-01-01 10:00:06
6
1
2021-01-01 10:00:18
12
1
2021-01-01 10:00:24
6
1
2021-01-01 10:00:30
6
1
2021-01-01 10:00:36
6
2
2021-01-01 11:00:00
6
2
2021-01-01 11:00:06
6
2
2021-01-01 11:00:12
6
2
2021-01-01 11:00:24
12
2
2021-01-01 11:00:30
6
2
2021-01-01 11:00:36
6
将此应用到 diff 列将 结果 在新的值列中
ID
Timestamp
Diff
Result
1
2021-01-01 10:00:00
6
1
1
2021-01-01 10:00:06
6
2
1
2021-01-01 10:00:18
12
1
1
2021-01-01 10:00:24
6
1
1
2021-01-01 10:00:30
6
2
1
2021-01-01 10:00:36
6
3
2
2021-01-01 11:00:00
6
1
2
2021-01-01 11:00:06
6
2
2
2021-01-01 11:00:12
6
3
2
2021-01-01 11:00:24
12
1
2
2021-01-01 11:00:30
6
1
2
2021-01-01 11:00:36
6
2
select t.id
,t.timestamp
,t.diff
,row_number() over (partition by id, rn_global - rn_diff order by timestamp) as Result
from (select t.id
,t.timestamp
,t.diff
,row_number() over (partition by id order by timestamp) rn_global
,row_number() over (partition by id, diff order by timestamp) rn_diff
from t
) t
order by id, timestamp
--
+----+------------------------------+------+--------+
| id | timestamp | diff | Result |
+----+------------------------------+------+--------+
| 1 | 2021-01-01T10:00:00.000+0000 | 6 | 1 |
| 1 | 2021-01-01T10:00:06.000+0000 | 6 | 2 |
| 1 | 2021-01-01T10:00:18.000+0000 | 12 | 1 |
| 1 | 2021-01-01T10:00:24.000+0000 | 6 | 1 |
| 1 | 2021-01-01T10:00:30.000+0000 | 6 | 2 |
| 1 | 2021-01-01T10:00:36.000+0000 | 6 | 3 |
| 2 | 2021-01-01T11:00:00.000+0000 | 6 | 1 |
| 2 | 2021-01-01T11:00:06.000+0000 | 6 | 2 |
| 2 | 2021-01-01T11:00:12.000+0000 | 6 | 3 |
| 2 | 2021-01-01T11:00:24.000+0000 | 12 | 1 |
| 2 | 2021-01-01T11:00:30.000+0000 | 6 | 1 |
| 2 | 2021-01-01T11:00:36.000+0000 | 6 | 2 |
+----+------------------------------+------+--------+
输入数据框如下所示:
ID | Timestamp | Diff |
---|---|---|
1 | 2021-01-01 10:00:00 | 6 |
1 | 2021-01-01 10:00:06 | 6 |
1 | 2021-01-01 10:00:18 | 12 |
1 | 2021-01-01 10:00:24 | 6 |
1 | 2021-01-01 10:00:30 | 6 |
1 | 2021-01-01 10:00:36 | 6 |
2 | 2021-01-01 11:00:00 | 6 |
2 | 2021-01-01 11:00:06 | 6 |
2 | 2021-01-01 11:00:12 | 6 |
2 | 2021-01-01 11:00:24 | 12 |
2 | 2021-01-01 11:00:30 | 6 |
2 | 2021-01-01 11:00:36 | 6 |
将此应用到 diff 列将 结果 在新的值列中
ID | Timestamp | Diff | Result |
---|---|---|---|
1 | 2021-01-01 10:00:00 | 6 | 1 |
1 | 2021-01-01 10:00:06 | 6 | 2 |
1 | 2021-01-01 10:00:18 | 12 | 1 |
1 | 2021-01-01 10:00:24 | 6 | 1 |
1 | 2021-01-01 10:00:30 | 6 | 2 |
1 | 2021-01-01 10:00:36 | 6 | 3 |
2 | 2021-01-01 11:00:00 | 6 | 1 |
2 | 2021-01-01 11:00:06 | 6 | 2 |
2 | 2021-01-01 11:00:12 | 6 | 3 |
2 | 2021-01-01 11:00:24 | 12 | 1 |
2 | 2021-01-01 11:00:30 | 6 | 1 |
2 | 2021-01-01 11:00:36 | 6 | 2 |
select t.id
,t.timestamp
,t.diff
,row_number() over (partition by id, rn_global - rn_diff order by timestamp) as Result
from (select t.id
,t.timestamp
,t.diff
,row_number() over (partition by id order by timestamp) rn_global
,row_number() over (partition by id, diff order by timestamp) rn_diff
from t
) t
order by id, timestamp
--
+----+------------------------------+------+--------+
| id | timestamp | diff | Result |
+----+------------------------------+------+--------+
| 1 | 2021-01-01T10:00:00.000+0000 | 6 | 1 |
| 1 | 2021-01-01T10:00:06.000+0000 | 6 | 2 |
| 1 | 2021-01-01T10:00:18.000+0000 | 12 | 1 |
| 1 | 2021-01-01T10:00:24.000+0000 | 6 | 1 |
| 1 | 2021-01-01T10:00:30.000+0000 | 6 | 2 |
| 1 | 2021-01-01T10:00:36.000+0000 | 6 | 3 |
| 2 | 2021-01-01T11:00:00.000+0000 | 6 | 1 |
| 2 | 2021-01-01T11:00:06.000+0000 | 6 | 2 |
| 2 | 2021-01-01T11:00:12.000+0000 | 6 | 3 |
| 2 | 2021-01-01T11:00:24.000+0000 | 12 | 1 |
| 2 | 2021-01-01T11:00:30.000+0000 | 6 | 1 |
| 2 | 2021-01-01T11:00:36.000+0000 | 6 | 2 |
+----+------------------------------+------+--------+