为 SQL 中每个唯一的列组合计算行数

Count rows for each unique combination of columns in SQL

我想 return 一组来自 table 的唯一记录基于两列以及最近的发布时间和总次数的组合在他们的输出记录之前(及时)出现了两列。

所以我想要得到的是这些方面的东西:

select col1, col2, max_posted, count from T
join (
 select col1, col2, max(posted) as posted  from T where groupid = "XXX" 
group by col1, col2) h
on ( T.col1 = h.col1 and
  T.col2 = h.col2 and
  T.max_posted = h.tposted)
where T.groupid = 'XXX'

Count 需要是输出中每个记录的 max_posted 之前 col1 和 col2 的每个组合出现的次数。 (希望我解释正确 :)

编辑:尝试以下建议:

 select dx.*,
   count(*) over (partition by dx.cicd9, dx.cdesc order by dx.tposted) as   cnt
from dx
join (
select cicd9, cdesc, max(tposted) as tposted  from dx where groupid ="XXX" 
group by cicd9, cdesc) h
on ( dx.cicd9 = h.cicd9 and
  dx.cdesc = h.cdesc and
  dx.tposted = h.tposted)
where groupid =  'XXX';

计数总是 return'1'。此外,您如何只计算 tposted 之前发生的记录?

这也失败了,但我希望你能明白我的意思:

  WITH H AS (
    SELECT cicd9, cdesc, max(tposted) as tposted  from dx where groupid =  'XXX' 
    group by cicd9, cdesc), 
    J AS (
    SELECT  count(*) as cnt
    FROM dx, h
    WHERE dx.cicd9 = h.cicd9
      and dx.cdesc = h.cdesc
      and dx.tposted <= h.tposted
      and dx.groupid = 'XXX'
 )
SELECT H.*,J.cnt
FROM H,J 

有人帮忙吗?

您只需要累计计数吗?

select t.*,
       count(*) over (partition by col1, col2 order by posted) as cnt
from table t
where groupid = 'xxx';

这个怎么样:

SELECT DISTINCT ON (cicd9, cdesc) cicd9, cdesc,
  max(posted) OVER w AS last_post,
  count(*) OVER w AS num_posts
FROM dx
WHERE groupid = 'XXX'
WINDOW w AS (
  PARTITION BY cicd9, cdesc
  RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);

鉴于缺少 PG 版本,table 定义、数据和所需的输出这只是胡说八道,但原则应该有效:在 groupid = 'XXX' 的两列上进行分区,然后找到 posted 列的最大值和 window 帧 中的总行数(因此 RANGE... 子句在 window 定义).

这是我能想到的最好的办法 -- 欢迎提出更好的建议!

这将产生我需要的结果,理解计数将始终至少为 1(来自连接):

  SELECT dx.cicd9, dx.cdesc, max(dx.tposted), count(*)
from dx 
join (
SELECT cicd9, cdesc, max(tposted) as tposted  from dx where groupid   =  'XXX' 
    group by cicd9, cdesc) h
on 
  (dx.cicd9 = h.cicd9 and dx.cdesc = h.cdesc and dx.tposted <= h.tposted 
  and dx.groupid = 'XXX')
group by dx.cicd9, dx.cdesc
order by dx.cdesc;

 WITH H AS (
    SELECT cicd9, cdesc, max(tposted) as tposted  from dx where groupid =  'XXX' 
    group by cicd9, cdesc)  
SELECT dx.cicd9, dx.cdesc, max(dx.tposted), count(*)
from dx, H
where dx.cicd9 = h.cicd9 and dx.cdesc = h.cdesc and dx.tposted <= h.tposted 
  and dx.groupid = 'XXX'
group by dx.cicd9, dx.cdesc
order by cdesc;

这令人困惑:

Count needs to be the number of times EACH combination of col1 and col2 occurred BEFORE the max_posted of each record in the output.

因为根据定义,每个 记录都是"before"(或同时)最新的post,这实质上意味着每个组合的总计数(忽略句子中假定的差一错误)。

所以这可以简化为一个简单的 GROUP BY:

SELECT cicd9, cdesc
     , max(posted) AS last_posted
     , count(*)    AS ct
FROM   dx
WHERE  groupid = 'XXX'
GROUP  BY 1, 2
ORDER  BY 1, 2;

与当前接受的答案完全相同。只是更快更简单。