如何执行最大限制号。 SQL 中每个日期每天的行数?
How to enforce a max limit no. of rows per day per date in SQL?
给定如下所示的数据,其中日期为字符串格式 YYYYMMDD
:
item
vietnamese
cost
unique_id
sales_date
fruits
trai cay
10
abc123
20211001
fruits
trai cay
8
foo99
20211001
fruits
trai cay
9
foo99
20211001
vege
rau
3
rr1239
20211001
vege
rau
3
rr1239
20211001
fruits
trai cay
12
abc123
20211002
fruits
trai cay
14
abc123
20211002
fruits
trai cay
8
abc123
20211002
fruits
trai cay
5
foo99
20211002
vege
rau
8
rr1239
20211002
vege
rau
1
rr1239
20211002
vege
rau
12
ud9213
20211002
vege
rau
19
r11759
20211002
fruits
trai cay
6
foo99
20211003
fruits
trai cay
2
abc123
20211003
fruits
trai cay
12
abc123
20211003
vege
rau
1
ud97863
20211003
vege
rau
9
r112359
20211003
fruits
trai cay
6
foo99
20211004
fruits
trai cay
2
abc123
20211004
fruits
trai cay
12
abc123
20211004
vege
rau
9
r112359
20211004
目标是在特定时间范围内对所有行进行采样,例如2020-10-02 到 2020-10-03 并且每天最多提取 3 行,例如使用此查询:
SELECT * FROM mytable
WHERE sales_date BETWEEN '20211002' AND '20211003'
ORDER BY RAND () LIMIT 6
上面 table 的预期输出是:
item
vietnamese
cost
unique_id
sales_date
fruits
trai cay
8
abc123
20211002
fruits
trai cay
5
foo99
20211002
vege
rau
8
rr1239
20211002
fruits
trai cay
12
abc123
20211003
vege
rau
1
ud97863
20211003
vege
rau
9
r112359
20211003
但有可能所有 6 行预期都来自一天:
item
vietnamese
cost
unique_id
sales_date
fruits
trai cay
12
abc123
20211002
fruits
trai cay
14
abc123
20211002
fruits
trai cay
8
abc123
20211002
fruits
trai cay
5
foo99
20211002
vege
rau
8
rr1239
20211002
vege
rau
1
rr1239
20211002
所以为了确保我每天最多 3 行,我每天 运行 多个查询,即
SELECT * FROM mytable
WHERE sales_date='20211002'
ORDER BY RAND () LIMIT 3
和
SELECT * FROM mytable
WHERE sales_date='20211003'
ORDER BY RAND () LIMIT 3
有没有办法保证N没有。单个查询中每天的最大限制行数?
否则有没有办法将每天一个查询组合成一个“超级查询”?如果我们谈论一整年,它将有 365 个查询, 每天一个。
由于 2 天内 6 行意味着每天恰好 3 行,我们将其扩展为一周。
在子查询中使用 row_number
为每个日期的每一行分配一个数字。然后只有 select 行号为 3 或更少的那些。
select *
from (
select
*,
row_number() over (partition by sales_date order by rand()) as row
from mytable
where sales_date between '20211002' and '20211009'
)
where row <= 3
order by rand()
limit 6
给定如下所示的数据,其中日期为字符串格式 YYYYMMDD
:
item | vietnamese | cost | unique_id | sales_date |
---|---|---|---|---|
fruits | trai cay | 10 | abc123 | 20211001 |
fruits | trai cay | 8 | foo99 | 20211001 |
fruits | trai cay | 9 | foo99 | 20211001 |
vege | rau | 3 | rr1239 | 20211001 |
vege | rau | 3 | rr1239 | 20211001 |
fruits | trai cay | 12 | abc123 | 20211002 |
fruits | trai cay | 14 | abc123 | 20211002 |
fruits | trai cay | 8 | abc123 | 20211002 |
fruits | trai cay | 5 | foo99 | 20211002 |
vege | rau | 8 | rr1239 | 20211002 |
vege | rau | 1 | rr1239 | 20211002 |
vege | rau | 12 | ud9213 | 20211002 |
vege | rau | 19 | r11759 | 20211002 |
fruits | trai cay | 6 | foo99 | 20211003 |
fruits | trai cay | 2 | abc123 | 20211003 |
fruits | trai cay | 12 | abc123 | 20211003 |
vege | rau | 1 | ud97863 | 20211003 |
vege | rau | 9 | r112359 | 20211003 |
fruits | trai cay | 6 | foo99 | 20211004 |
fruits | trai cay | 2 | abc123 | 20211004 |
fruits | trai cay | 12 | abc123 | 20211004 |
vege | rau | 9 | r112359 | 20211004 |
目标是在特定时间范围内对所有行进行采样,例如2020-10-02 到 2020-10-03 并且每天最多提取 3 行,例如使用此查询:
SELECT * FROM mytable
WHERE sales_date BETWEEN '20211002' AND '20211003'
ORDER BY RAND () LIMIT 6
上面 table 的预期输出是:
item | vietnamese | cost | unique_id | sales_date |
---|---|---|---|---|
fruits | trai cay | 8 | abc123 | 20211002 |
fruits | trai cay | 5 | foo99 | 20211002 |
vege | rau | 8 | rr1239 | 20211002 |
fruits | trai cay | 12 | abc123 | 20211003 |
vege | rau | 1 | ud97863 | 20211003 |
vege | rau | 9 | r112359 | 20211003 |
但有可能所有 6 行预期都来自一天:
item | vietnamese | cost | unique_id | sales_date |
---|---|---|---|---|
fruits | trai cay | 12 | abc123 | 20211002 |
fruits | trai cay | 14 | abc123 | 20211002 |
fruits | trai cay | 8 | abc123 | 20211002 |
fruits | trai cay | 5 | foo99 | 20211002 |
vege | rau | 8 | rr1239 | 20211002 |
vege | rau | 1 | rr1239 | 20211002 |
所以为了确保我每天最多 3 行,我每天 运行 多个查询,即
SELECT * FROM mytable
WHERE sales_date='20211002'
ORDER BY RAND () LIMIT 3
和
SELECT * FROM mytable
WHERE sales_date='20211003'
ORDER BY RAND () LIMIT 3
有没有办法保证N没有。单个查询中每天的最大限制行数?
否则有没有办法将每天一个查询组合成一个“超级查询”?如果我们谈论一整年,它将有 365 个查询, 每天一个。
由于 2 天内 6 行意味着每天恰好 3 行,我们将其扩展为一周。
在子查询中使用 row_number
为每个日期的每一行分配一个数字。然后只有 select 行号为 3 或更少的那些。
select *
from (
select
*,
row_number() over (partition by sales_date order by rand()) as row
from mytable
where sales_date between '20211002' and '20211009'
)
where row <= 3
order by rand()
limit 6