如何通过postgresql中的组合计算并发事件的frequency/count？

Question

我正在寻找一种方法来识别同时发生的事件名称名称：即，将事件名称与相同的开始 (startts) 和结束 (endts) 时间相关联：事件完全并发（部分重叠不是这个数据库的一个特性，这使得这个条件标准更容易满足）。

玩具数据框

+------------------+
|name startts endts|
| A   02:20  02:23 |
| A   02:23  02:25 |
| A   02:27  02:28 |
| B   02:20  02:23 |
| B   02:23  02:25 |
| B   02:25  02:27 |
| C   02:27  02:28 |
| D   02:27  02:28 |
| D   02:28  02:31 |
| E   02:27  02:28 |
| E   02:29  02:31 |
+------------------+

理想输出：


+---------------------------+
|combination| count         |
+---------------------------+
|  AB       | 2             |
|  AC       | 1             |
|  AE       | 1             |
|  AD       | 1             |
|  BC       | 0             |
|  BD       | 0             |
|  BE       | 0             |
|  CE       | 0             |
+-----------+---------------+

当然，我会尝试一个循环，但我认识到 PostgreSQL 不是最佳选择。

我尝试过的是通过选择不同的名称以及开始和结束组合然后在 table 本身（选择名称）上进行左连接来生成临时 table。

用户@GMB 提供了以下（修改后的）解决方案； 但是，考虑到数据库的大小，性能并不令人满意（即使运行 10 分钟的时间 window 的查询也永远无法完成）。对于上下文，大约有 300-400 个唯一名称；所以大约有 80200 个组合（如果我的数学没问题的话）。顺序对于排列并不重要。

@GMB 的尝试：我将其理解为自连接、聚合和匹配间隔的条件计数：

    select t1.name name1, t2.name name2,
        sum(case when t1.startts = t2.startts and t1.endts = t2.endts then 1 else 0 end) cnt
    from mytable t1
    inner join mytable t2 on t2.name > t1.name
    group by t1.name, t2.name
    order by t1.name, t2.name

Demo on DB Fiddle:

name1 | name2 | cnt
:---- | :---- | --:
A     | B     |   2
A     | C     |   1
A     | D     |   1
A     | E     |   1
B     | C     |   0
B     | D     |   0
B     | E     |   0
C     | D     |   1
C     | E     |   1
D     | E     |   1

@GMB 指出，如果您正在寻找重叠间隔的计数，您所要做的就是将 sum() 更改为：

    sum(t1.startts <= t2.endts and t1.endts >= t2.startts) cnt

Version = PostgreSQL 8.0.2 on i686-pc-linux-gnu，由 GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3) 编译，Redshift 1.0。 19097

谢谢。

Answer 1

考虑 MySQL 中的以下内容（您的 DBFiddle 指向的位置）：

SELECT name, COUNT(*)
FROM (
    SELECT group_concat(name ORDER BY name) name
    FROM mytable
    GROUP BY startts, endts
    ORDER BY name
) as names
GROUP BY name
ORDER BY name

PostgreSQL 中的等价物：

SELECT name, COUNT(*)
FROM (
    SELECT string_agg(name ORDER BY name) name
    FROM mytable
    GROUP BY startts, endts
    ORDER BY name
) as names
GROUP BY name
ORDER BY name

首先，您创建一个并发事件列表（在子查询中），然后计算它们。

如何通过postgresql中的组合计算并发事件的frequency/count？

How to compute frequency/count of concurrent events by combination in postgresql?

postgresql

inner-join

aggregation

conditional-statements