SELECT 1% 的不同值

SELECT 1 percent of distinct values

SQL 服务器 2008 R2

我有一个名为 Actions 的 table,这将是其外观的片段

ActionID | ActionType | ActionUserID | ActionDateTime
---------+------------+--------------+---------------------
555363     Open         9843           2020-09-15 09:27:55
555364     Process      2563           2020-09-15 09:31:22
555365     Close        8522           2020-09-15 09:37:48
555366     Detour       9843           2020-09-15 09:42:42
555367     Process      9843           2020-09-15 09:51:50
555368     Close        8522           2020-09-15 09:55:45
555369     Open         1685           2020-09-15 09:57:12
555370     Detour       2563           2020-09-15 10:03:23
555371     Detour       9843           2020-09-15 10:04:33
555372     Close        8522           2020-09-15 10:07:44

table有几十万行。我想做的是审查每个用户在特定月份执行的所有操作的 1%。

我知道我可以通过以下方式获得所有东西的 1%:

SELECT TOP 1 PERCENT * 
FROM Actions 
WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020' 
ORDER BY NEWID()

我知道我可以通过以下方式获得特定用户的 1%:

SELECT TOP 1 PERCENT * 
FROM Actions 
WHERE ActionUserID = 9843 
  AND ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020' 
ORDER BY NEWID()

但我真正想得到的是每个用户的1%。我知道我可以通过以下方式获得当月执行操作的用户列表:

SELECT DISTINCT(ActionUserID) 
WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'

但是我不确定如何组合这两个查询。

But what I really want to get is 1% of each user.

我会推荐 window 函数 percent_rank():

select *
from (
    select a.*, percent_rank() over(partition by actionuserid order by newid()) prn
    from actions a
    where actiondatetime >= '20200901' and actiondatetime < '20201001'
) a
where prn < 0.01

如果你的SQL服务器版本太旧,不支持percent_rank(),那么我们可以用rank()count()进行模拟:

select *
from (
    select a.*, 
        rank() over(partition by actionuserid order by newid()) as rn,
        count(*) over(partition by actionuserid) as cnt
    from actions a
    where actiondatetime >= '20200901' and actiondatetime < '20201001'
) a
where 100.0 * rn / cnt  < 1 or (rn = 1 and cnt < 100)

您可以使用 CROSS APPLY 轻松组合两个查询:

SELECT a.*
FROM (
    SELECT DISTINCT ActionUserID 
    WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
) u
CROSS APPLY
(
    SELECT TOP 1 PERCENT * 
    FROM Actions a
    WHERE a.ActionUserID = u.ActionUserID 
    AND ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020' 
    ORDER BY NEWID()
) a