分配等级后如何分组

How to group into batches after assigning rank

我有一个 table,我首先尝试根据唯一列值(使用 dense_rank)进行分组,然后将这些项目进一步分组为 5 个批次。下面是我的 table:

video_id frame_id verb
video_a frame_1 walk
video_a frame_2 run
video_a frame_3 sit
video_a frame_4 walk
video_a frame_5 walk
video_a frame_6 walk
video_b frame_7 stand
video_b frame_8 stand
video_b frame_9 run
video_b frame_10 run
video_b frame_11 sit
video_b frame_12 run
video_b frame_13 run

下面是我想要得到的:

video_id frame_id verb batch_of_five
video_a frame_1 walk 1
video_a frame_2 run 1
video_a frame_3 sit 1
video_a frame_4 walk 1
video_a frame_5 walk 1
video_a frame_6 walk 2
video_b frame_7 stand 3
video_b frame_8 stand 3
video_b frame_9 run 3
video_b frame_10 run 3
video_b frame_11 sit 3
video_b frame_12 run 4
video_b frame_13 run 4

其中每个 video_id 都有一个唯一的排名,每个排名 video_id 中的每批 10 个都有自己独特的排名(并且每批 10 个整体都有一个唯一的 ID,无论它们是否属于是否相同 video_id

我可以根据 video_id 列进行分组,但我无法进一步对这些项目进行分组,因此它们都是 10 批,并且在所有 video_ids 中都是唯一的。我考虑过使用 group by 子句,但我也试图保持其他列的完整性(verb 列)。

到目前为止,这是我的快速查询:

SELECT
    *
FROM (
    SELECT
        *,
        -- Give each unique video_id a unique rank
        DENSE_RANK() OVER (ORDER BY video_id) AS video_batch
    FROM videos
)

计算帧排名(按video_id分区),除以6(整数除法)得到video_id分区中的批号。然后再排序得到绝对批号:

with sample_data as(        
select 'video_a' as video_id, 'frame_1' as frame_id , 'walk' as verb union all 
select 'video_a', 'frame_2' , 'run'   union all
select 'video_a', 'frame_3' , 'sit'   union all
select 'video_a', 'frame_4' , 'walk'  union all
select 'video_a', 'frame_5' , 'walk'  union all
select 'video_a', 'frame_6' , 'walk'  union all
select 'video_b', 'frame_7' , 'stand' union all
select 'video_b', 'frame_8' , 'stand' union all
select 'video_b', 'frame_9' , 'run'   union all
select 'video_b', 'frame_10', 'run'   union all
select 'video_b', 'frame_11', 'sit'   union all
select 'video_b', 'frame_12', 'run'   union all
select 'video_b', 'frame_13', 'run'
)

select s.*, 
       dense_rank() over(order by video_id, rnk_frame / 6) batch_of_five
from
(
select video_id, frame_id, verb, 
       CAST(regexp_extract(frame_id,'_(\d*)$',1) AS INT) frame_number,
       dense_rank() over(partition by video_id order by CAST(regexp_extract(frame_id,'_(\d*)$',1) AS INT)) rnk_frame
  from sample_data
)s
order by video_id, frame_number;

结果:

video_id    frame_id    verb    frame_number    rnk_frame   batch_of_five
video_a     frame_1     walk    1                1           1
video_a     frame_2     run     2                2           1
video_a     frame_3     sit     3                3           1
video_a     frame_4     walk    4                4           1
video_a     frame_5     walk    5                5           1
video_a     frame_6     walk    6                6           2
video_b     frame_7     stand   7                1           3
video_b     frame_8     stand   8                2           3
video_b     frame_9     run     9                3           3
video_b     frame_10    run     10               4           3
video_b     frame_11    sit     11               5           3
video_b     frame_12    run     12               6           4
video_b     frame_13    run     13               7           4

我提取 frame_number 排序为整数,而不是字符串,以获得与你的问题相同的排序顺序(一些排序列是绝对必要的),如果你已经有你提到的排名, 你可以改用它。