如何执行最大限制号。 SQL 中每个日期每天的行数?

How to enforce a max limit no. of rows per day per date in SQL?

给定如下所示的数据,其中日期为字符串格式 YYYYMMDD:

item vietnamese cost unique_id sales_date
fruits trai cay 10 abc123 20211001
fruits trai cay 8 foo99 20211001
fruits trai cay 9 foo99 20211001
vege rau 3 rr1239 20211001
vege rau 3 rr1239 20211001
fruits trai cay 12 abc123 20211002
fruits trai cay 14 abc123 20211002
fruits trai cay 8 abc123 20211002
fruits trai cay 5 foo99 20211002
vege rau 8 rr1239 20211002
vege rau 1 rr1239 20211002
vege rau 12 ud9213 20211002
vege rau 19 r11759 20211002
fruits trai cay 6 foo99 20211003
fruits trai cay 2 abc123 20211003
fruits trai cay 12 abc123 20211003
vege rau 1 ud97863 20211003
vege rau 9 r112359 20211003
fruits trai cay 6 foo99 20211004
fruits trai cay 2 abc123 20211004
fruits trai cay 12 abc123 20211004
vege rau 9 r112359 20211004

目标是在特定时间范围内对所有行进行采样,例如2020-10-02 到 2020-10-03 并且每天最多提取 3 行,例如使用此查询:

SELECT * FROM mytable
WHERE sales_date BETWEEN '20211002' AND '20211003'
ORDER BY RAND () LIMIT 6

上面 table 的预期输出是:

item vietnamese cost unique_id sales_date
fruits trai cay 8 abc123 20211002
fruits trai cay 5 foo99 20211002
vege rau 8 rr1239 20211002
fruits trai cay 12 abc123 20211003
vege rau 1 ud97863 20211003
vege rau 9 r112359 20211003

但有可能所有 6 行预期都来自一天:

item vietnamese cost unique_id sales_date
fruits trai cay 12 abc123 20211002
fruits trai cay 14 abc123 20211002
fruits trai cay 8 abc123 20211002
fruits trai cay 5 foo99 20211002
vege rau 8 rr1239 20211002
vege rau 1 rr1239 20211002

所以为了确保我每天最多 3 行,我每天 运行 多个查询,即

SELECT * FROM mytable
WHERE sales_date='20211002'
ORDER BY RAND () LIMIT 3

SELECT * FROM mytable
WHERE sales_date='20211003'
ORDER BY RAND () LIMIT 3

有没有办法保证N没有。单个查询中每天的最大限制行数?

否则有没有办法将每天一个查询组合成一个“超级查询”?如果我们谈论一整年,它将有 365 个查询, 每天一个。

由于 2 天内 6 行意味着每天恰好 3 行,我们将其扩展为一周。

在子查询中使用 row_number 为每个日期的每一行分配一个数字。然后只有 select 行号为 3 或更少的那些。

select *
from (
  select
    *,
    row_number() over (partition by sales_date order by rand()) as row
  from mytable
  where sales_date between '20211002' and '20211009'
)
where row <= 3
order by rand()
limit 6