分配等级后如何分组
How to group into batches after assigning rank
我有一个 table,我首先尝试根据唯一列值(使用 dense_rank
)进行分组,然后将这些项目进一步分组为 5 个批次。下面是我的 table:
video_id
frame_id
verb
video_a
frame_1
walk
video_a
frame_2
run
video_a
frame_3
sit
video_a
frame_4
walk
video_a
frame_5
walk
video_a
frame_6
walk
video_b
frame_7
stand
video_b
frame_8
stand
video_b
frame_9
run
video_b
frame_10
run
video_b
frame_11
sit
video_b
frame_12
run
video_b
frame_13
run
下面是我想要得到的:
video_id
frame_id
verb
batch_of_five
video_a
frame_1
walk
1
video_a
frame_2
run
1
video_a
frame_3
sit
1
video_a
frame_4
walk
1
video_a
frame_5
walk
1
video_a
frame_6
walk
2
video_b
frame_7
stand
3
video_b
frame_8
stand
3
video_b
frame_9
run
3
video_b
frame_10
run
3
video_b
frame_11
sit
3
video_b
frame_12
run
4
video_b
frame_13
run
4
其中每个 video_id
都有一个唯一的排名,每个排名 video_id
中的每批 10 个都有自己独特的排名(并且每批 10 个整体都有一个唯一的 ID,无论它们是否属于是否相同 video_id
。
我可以根据 video_id
列进行分组,但我无法进一步对这些项目进行分组,因此它们都是 10 批,并且在所有 video_ids
中都是唯一的。我考虑过使用 group by 子句,但我也试图保持其他列的完整性(verb
列)。
到目前为止,这是我的快速查询:
SELECT
*
FROM (
SELECT
*,
-- Give each unique video_id a unique rank
DENSE_RANK() OVER (ORDER BY video_id) AS video_batch
FROM videos
)
计算帧排名(按video_id分区),除以6(整数除法)得到video_id分区中的批号。然后再排序得到绝对批号:
with sample_data as(
select 'video_a' as video_id, 'frame_1' as frame_id , 'walk' as verb union all
select 'video_a', 'frame_2' , 'run' union all
select 'video_a', 'frame_3' , 'sit' union all
select 'video_a', 'frame_4' , 'walk' union all
select 'video_a', 'frame_5' , 'walk' union all
select 'video_a', 'frame_6' , 'walk' union all
select 'video_b', 'frame_7' , 'stand' union all
select 'video_b', 'frame_8' , 'stand' union all
select 'video_b', 'frame_9' , 'run' union all
select 'video_b', 'frame_10', 'run' union all
select 'video_b', 'frame_11', 'sit' union all
select 'video_b', 'frame_12', 'run' union all
select 'video_b', 'frame_13', 'run'
)
select s.*,
dense_rank() over(order by video_id, rnk_frame / 6) batch_of_five
from
(
select video_id, frame_id, verb,
CAST(regexp_extract(frame_id,'_(\d*)$',1) AS INT) frame_number,
dense_rank() over(partition by video_id order by CAST(regexp_extract(frame_id,'_(\d*)$',1) AS INT)) rnk_frame
from sample_data
)s
order by video_id, frame_number;
结果:
video_id frame_id verb frame_number rnk_frame batch_of_five
video_a frame_1 walk 1 1 1
video_a frame_2 run 2 2 1
video_a frame_3 sit 3 3 1
video_a frame_4 walk 4 4 1
video_a frame_5 walk 5 5 1
video_a frame_6 walk 6 6 2
video_b frame_7 stand 7 1 3
video_b frame_8 stand 8 2 3
video_b frame_9 run 9 3 3
video_b frame_10 run 10 4 3
video_b frame_11 sit 11 5 3
video_b frame_12 run 12 6 4
video_b frame_13 run 13 7 4
我提取 frame_number 排序为整数,而不是字符串,以获得与你的问题相同的排序顺序(一些排序列是绝对必要的),如果你已经有你提到的排名, 你可以改用它。
我有一个 table,我首先尝试根据唯一列值(使用 dense_rank
)进行分组,然后将这些项目进一步分组为 5 个批次。下面是我的 table:
video_id | frame_id | verb |
---|---|---|
video_a | frame_1 | walk |
video_a | frame_2 | run |
video_a | frame_3 | sit |
video_a | frame_4 | walk |
video_a | frame_5 | walk |
video_a | frame_6 | walk |
video_b | frame_7 | stand |
video_b | frame_8 | stand |
video_b | frame_9 | run |
video_b | frame_10 | run |
video_b | frame_11 | sit |
video_b | frame_12 | run |
video_b | frame_13 | run |
下面是我想要得到的:
video_id | frame_id | verb | batch_of_five |
---|---|---|---|
video_a | frame_1 | walk | 1 |
video_a | frame_2 | run | 1 |
video_a | frame_3 | sit | 1 |
video_a | frame_4 | walk | 1 |
video_a | frame_5 | walk | 1 |
video_a | frame_6 | walk | 2 |
video_b | frame_7 | stand | 3 |
video_b | frame_8 | stand | 3 |
video_b | frame_9 | run | 3 |
video_b | frame_10 | run | 3 |
video_b | frame_11 | sit | 3 |
video_b | frame_12 | run | 4 |
video_b | frame_13 | run | 4 |
其中每个 video_id
都有一个唯一的排名,每个排名 video_id
中的每批 10 个都有自己独特的排名(并且每批 10 个整体都有一个唯一的 ID,无论它们是否属于是否相同 video_id
。
我可以根据 video_id
列进行分组,但我无法进一步对这些项目进行分组,因此它们都是 10 批,并且在所有 video_ids
中都是唯一的。我考虑过使用 group by 子句,但我也试图保持其他列的完整性(verb
列)。
到目前为止,这是我的快速查询:
SELECT
*
FROM (
SELECT
*,
-- Give each unique video_id a unique rank
DENSE_RANK() OVER (ORDER BY video_id) AS video_batch
FROM videos
)
计算帧排名(按video_id分区),除以6(整数除法)得到video_id分区中的批号。然后再排序得到绝对批号:
with sample_data as(
select 'video_a' as video_id, 'frame_1' as frame_id , 'walk' as verb union all
select 'video_a', 'frame_2' , 'run' union all
select 'video_a', 'frame_3' , 'sit' union all
select 'video_a', 'frame_4' , 'walk' union all
select 'video_a', 'frame_5' , 'walk' union all
select 'video_a', 'frame_6' , 'walk' union all
select 'video_b', 'frame_7' , 'stand' union all
select 'video_b', 'frame_8' , 'stand' union all
select 'video_b', 'frame_9' , 'run' union all
select 'video_b', 'frame_10', 'run' union all
select 'video_b', 'frame_11', 'sit' union all
select 'video_b', 'frame_12', 'run' union all
select 'video_b', 'frame_13', 'run'
)
select s.*,
dense_rank() over(order by video_id, rnk_frame / 6) batch_of_five
from
(
select video_id, frame_id, verb,
CAST(regexp_extract(frame_id,'_(\d*)$',1) AS INT) frame_number,
dense_rank() over(partition by video_id order by CAST(regexp_extract(frame_id,'_(\d*)$',1) AS INT)) rnk_frame
from sample_data
)s
order by video_id, frame_number;
结果:
video_id frame_id verb frame_number rnk_frame batch_of_five
video_a frame_1 walk 1 1 1
video_a frame_2 run 2 2 1
video_a frame_3 sit 3 3 1
video_a frame_4 walk 4 4 1
video_a frame_5 walk 5 5 1
video_a frame_6 walk 6 6 2
video_b frame_7 stand 7 1 3
video_b frame_8 stand 8 2 3
video_b frame_9 run 9 3 3
video_b frame_10 run 10 4 3
video_b frame_11 sit 11 5 3
video_b frame_12 run 12 6 4
video_b frame_13 run 13 7 4
我提取 frame_number 排序为整数,而不是字符串,以获得与你的问题相同的排序顺序(一些排序列是绝对必要的),如果你已经有你提到的排名, 你可以改用它。