sql: Select count(*) - 每组的第 n 条记录

Question

我按 tenant_id 分组。我想从每个 GROUPBY 组中 select count() - 第 1000 条记录（按 _updated 时间排序），对于 count() 大于 1000 的组。如下:

select t1.tenant_id,
(select temp._updated
 from trace temp
 where temp.tenant_id = t1.tenant_id
 order by _updated limit 1 offset   
    count(*) - 1000
) as timekey 
from fgc.trace as t1
group by tenant_id 
having count(*)  > 1000;

但是这是不允许的，因为 count(*) 不能在子查询中使用。

所以我尝试了以下方法，但仍然无效，因为我无法访问 t1，因为这不是联接。

select t1.tenant_id,
(select temp._updated
 from trace temp
 where temp.tenant_id = t1.tenant_id
 order by _updated limit 1 offset   
    (select count(*)-1000 
     from trace t2
     group by tenant_id 
     having t2.tenant_id = t1.tenant_id)
) as timekey 
from fgc.trace as t1
group by tenant_id 
having count(*)  > 1000;

那么我怎样才能得到以下内容呢？

  tenant_id |             timekey               
+-----------+----------------------------------+
  n7ia6ryc  | 2019-07-23 23:09:49.951406+00:00

Answer 1

你似乎想要ROW_NUMBER()。 Cockroach supports windows functions，所以：

SELECT updated
FROM (
    SELECT
        tenant_id, 
        updated,
        ROW_NUMBER() OVER(PARTITION BY tenant_id ORDER BY updated DESC) rn
    FROM trace
) x WHERE rn = 1001

对于每个 tenant_id，这将 return 第 1001 个最近记录的时间戳。如果给定租户的记录少于 1000 条，则不会出现在结果中。

Answer 2

select x.tenant_id
from (
  select t.tenant_id,
         row_number() over (partition by t.tenant_id order by t.timekey) as tenant_number
  from fgc.trace as t
) x
where x.tenant_number > 1000
group by x.tenant_id

只有一个时间戳看起来像这样：

select min(x.timekey) as min_timestamp
from (
  select t.tenant_id, t.timekey,
         row_number() over (partition by t.tenant_id order by t.timekey) as tenant_number
  from fgc.trace as t
) x
where x.tenant_number > 1000

请注意，这里分组无关紧要，因为每一行只能属于一组，而您只查看一行。

sql: Select count(*) - 每组的第 n 条记录

sql: Select count(*) - nth record from each group

sql

cockroachdb