SELECT 1% 的不同值
SELECT 1 percent of distinct values
SQL 服务器 2008 R2
我有一个名为 Actions 的 table,这将是其外观的片段
ActionID | ActionType | ActionUserID | ActionDateTime
---------+------------+--------------+---------------------
555363 Open 9843 2020-09-15 09:27:55
555364 Process 2563 2020-09-15 09:31:22
555365 Close 8522 2020-09-15 09:37:48
555366 Detour 9843 2020-09-15 09:42:42
555367 Process 9843 2020-09-15 09:51:50
555368 Close 8522 2020-09-15 09:55:45
555369 Open 1685 2020-09-15 09:57:12
555370 Detour 2563 2020-09-15 10:03:23
555371 Detour 9843 2020-09-15 10:04:33
555372 Close 8522 2020-09-15 10:07:44
table有几十万行。我想做的是审查每个用户在特定月份执行的所有操作的 1%。
我知道我可以通过以下方式获得所有东西的 1%:
SELECT TOP 1 PERCENT *
FROM Actions
WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
ORDER BY NEWID()
我知道我可以通过以下方式获得特定用户的 1%:
SELECT TOP 1 PERCENT *
FROM Actions
WHERE ActionUserID = 9843
AND ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
ORDER BY NEWID()
但我真正想得到的是每个用户的1%。我知道我可以通过以下方式获得当月执行操作的用户列表:
SELECT DISTINCT(ActionUserID)
WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
但是我不确定如何组合这两个查询。
But what I really want to get is 1% of each user.
我会推荐 window 函数 percent_rank()
:
select *
from (
select a.*, percent_rank() over(partition by actionuserid order by newid()) prn
from actions a
where actiondatetime >= '20200901' and actiondatetime < '20201001'
) a
where prn < 0.01
如果你的SQL服务器版本太旧,不支持percent_rank()
,那么我们可以用rank()
和count()
进行模拟:
select *
from (
select a.*,
rank() over(partition by actionuserid order by newid()) as rn,
count(*) over(partition by actionuserid) as cnt
from actions a
where actiondatetime >= '20200901' and actiondatetime < '20201001'
) a
where 100.0 * rn / cnt < 1 or (rn = 1 and cnt < 100)
您可以使用 CROSS APPLY 轻松组合两个查询:
SELECT a.*
FROM (
SELECT DISTINCT ActionUserID
WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
) u
CROSS APPLY
(
SELECT TOP 1 PERCENT *
FROM Actions a
WHERE a.ActionUserID = u.ActionUserID
AND ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
ORDER BY NEWID()
) a
SQL 服务器 2008 R2
我有一个名为 Actions 的 table,这将是其外观的片段
ActionID | ActionType | ActionUserID | ActionDateTime
---------+------------+--------------+---------------------
555363 Open 9843 2020-09-15 09:27:55
555364 Process 2563 2020-09-15 09:31:22
555365 Close 8522 2020-09-15 09:37:48
555366 Detour 9843 2020-09-15 09:42:42
555367 Process 9843 2020-09-15 09:51:50
555368 Close 8522 2020-09-15 09:55:45
555369 Open 1685 2020-09-15 09:57:12
555370 Detour 2563 2020-09-15 10:03:23
555371 Detour 9843 2020-09-15 10:04:33
555372 Close 8522 2020-09-15 10:07:44
table有几十万行。我想做的是审查每个用户在特定月份执行的所有操作的 1%。
我知道我可以通过以下方式获得所有东西的 1%:
SELECT TOP 1 PERCENT *
FROM Actions
WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
ORDER BY NEWID()
我知道我可以通过以下方式获得特定用户的 1%:
SELECT TOP 1 PERCENT *
FROM Actions
WHERE ActionUserID = 9843
AND ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
ORDER BY NEWID()
但我真正想得到的是每个用户的1%。我知道我可以通过以下方式获得当月执行操作的用户列表:
SELECT DISTINCT(ActionUserID)
WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
但是我不确定如何组合这两个查询。
But what I really want to get is 1% of each user.
我会推荐 window 函数 percent_rank()
:
select *
from (
select a.*, percent_rank() over(partition by actionuserid order by newid()) prn
from actions a
where actiondatetime >= '20200901' and actiondatetime < '20201001'
) a
where prn < 0.01
如果你的SQL服务器版本太旧,不支持percent_rank()
,那么我们可以用rank()
和count()
进行模拟:
select *
from (
select a.*,
rank() over(partition by actionuserid order by newid()) as rn,
count(*) over(partition by actionuserid) as cnt
from actions a
where actiondatetime >= '20200901' and actiondatetime < '20201001'
) a
where 100.0 * rn / cnt < 1 or (rn = 1 and cnt < 100)
您可以使用 CROSS APPLY 轻松组合两个查询:
SELECT a.*
FROM (
SELECT DISTINCT ActionUserID
WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
) u
CROSS APPLY
(
SELECT TOP 1 PERCENT *
FROM Actions a
WHERE a.ActionUserID = u.ActionUserID
AND ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
ORDER BY NEWID()
) a