Postgresql 9.3:如何使用具有多个索引的交叉表?

Postgresql 9.3: How to use crosstab with multiple indexes?

这是一个 sqlFiddle 显示我正在尝试做的事情。
这是 @lad2025 sqlFiddle 显示得更好

我的 table 上有两个索引,外加一个包含列名的列和一个包含值的列。

在 fiddle 中,我显示了一个执行我想执行的操作的查询。但它很慢。

我有一个交叉表请求,它做几乎相同的事情,速度非常快,但几乎没有错误。 (它会融合一些线)

SELECT 
    end_user_id, 
    tms, 
    coalesce(max(IN_VEHICLE), 0) as IN_VEHICLE, 
    coalesce(max(ON_BICYCLE), 0) as ON_BICYCLE, 
    coalesce(max(ON_FOOT),    0) as ON_FOOT, 
    coalesce(max(RUNNING),    0) as RUNNING, 
    coalesce(max(STILL),      0) as STILL, 
    coalesce(max(TILTING),    0) as TILTING, 
    coalesce(max(UNKNOWN),    0) as UNKNOWN, 
    coalesce(max(WALKING),    0) as WALKING 
FROM
    crosstab (            
        'SELECT end_user_id, tms, type, max(confidence) FROM activities group by 1,2,3 ',
        'SELECT DISTINCT type FROM activities order by type'
    )as newtable (
        end_user_id text, 
        tms         timestamp,
        IN_VEHICLE  float,
        ON_BICYCLE  float,
        ON_FOOT     float,
        RUNNING     float,
        STILL       float,
        TILTING     float,
        UNKNOWN     float,
        WALKING     float
    )  
GROUP BY end_user_id, tms
ORDER BY end_user_id, tms

我不知道为什么postgres要我GROUP BY end_user_id,最后是tms...应该是唯一的。
我也不知道为什么,但如果我不在交叉表查询中分组,我将只有一行 end_user_id :(

如何更正该交叉表请求?

编辑: @lad2025 的响应比我的更好,更优雅,而且我敢肯定更快。尽管如此,我还是想知道如何使用交叉表来做到这一点。

您可以像在 Fiddle 中那样避免 crosstab/multiple 左连接,并使用简单的条件聚合:

SELECT 
 end_user_id,
 tms,
 COALESCE(MAX(CASE WHEN type = 'IN_VEHICLE' THEN confidence END),0) AS IN_VEHICLE,
 COALESCE(MAX(CASE WHEN type = 'ON_BICYCLE' THEN confidence END),0) AS ON_BICYCLE,
 COALESCE(MAX(CASE WHEN type = 'ON_FOOT'    THEN confidence END),0) AS ON_FOOT,
 COALESCE(MAX(CASE WHEN type = 'RUNNING'    THEN confidence END),0) AS RUNNING,
 COALESCE(MAX(CASE WHEN type = 'STILL'      THEN confidence END),0) AS STILL,
 COALESCE(MAX(CASE WHEN type = 'TILTING'    THEN confidence END),0) AS TILTING,
 COALESCE(MAX(CASE WHEN type = 'UNKNOWN'    THEN confidence END),0) AS UNKNOWN,
 COALESCE(MAX(CASE WHEN type = 'WALKING'    THEN confidence END),0) AS WALKING
FROM activities
GROUP BY end_user_id, tms
ORDER BY end_user_id, tms;

SqlFiddleDemo

输出:

╔═══════════════════╦════════════════════════════╦═════════════╦═════════════╦══════════╦══════════╦════════╦══════════╦══════════╦═════════╗
║   end_user_id     ║            tms             ║ in_vehicle  ║ on_bicycle  ║ on_foot  ║ running  ║ still  ║ tilting  ║ unknown  ║ walking ║
╠═══════════════════╬════════════════════════════╬═════════════╬═════════════╬══════════╬══════════╬════════╬══════════╬══════════╬═════════╣
║ 64e8394876a5b7f1  ║ October, 28 2015 08:24:20  ║         21  ║          8  ║       2  ║       0  ║     2  ║       0  ║      68  ║       2 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:24:41  ║         15  ║          0  ║       3  ║       0  ║    72  ║       0  ║      10  ║       3 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:25:17  ║          5  ║          0  ║       5  ║       0  ║    77  ║     100  ║      13  ║       5 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:25:32  ║          0  ║          0  ║       0  ║       0  ║   100  ║       0  ║       0  ║       0 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:25:36  ║          0  ║          0  ║       0  ║       0  ║    92  ║       0  ║       8  ║       0 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:27:24  ║         48  ║         48  ║       0  ║       0  ║     0  ║       0  ║       5  ║       0 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:27:54  ║          0  ║          0  ║       0  ║       0  ║     0  ║     100  ║       0  ║       0 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:28:11  ║         62  ║          8  ║       3  ║       0  ║    15  ║       0  ║      13  ║       3 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:28:53  ║         35  ║          0  ║       6  ║       0  ║    37  ║       0  ║      23  ║       6 ║
║ 64e8394876a5b7f1  ║ October, 28 2015 08:29:16  ║         54  ║          2  ║       0  ║       0  ║    10  ║       0  ║      35  ║       0 ║
║ e86b0b91546194cc  ║ October, 28 2015 08:24:41  ║         13  ║         13  ║      69  ║       3  ║     0  ║     100  ║       5  ║      67 ║
║ e86b0b91546194cc  ║ October, 28 2015 08:33:33  ║          0  ║          0  ║     100  ║       0  ║     0  ║       0  ║       0  ║     100 ║
║ e86b0b91546194cc  ║ October, 28 2015 08:33:38  ║          0  ║          0  ║     100  ║       0  ║     0  ║       0  ║       0  ║     100 ║
║ e86b0b91546194cc  ║ October, 28 2015 08:34:06  ║         19  ║          6  ║      31  ║       2  ║    29  ║       0  ║      16  ║      29 ║
║ e86b0b91546194cc  ║ October, 28 2015 08:34:34  ║          3  ║          0  ║       0  ║       0  ║    95  ║       0  ║       3  ║       0 ║
╚═══════════════════╩════════════════════════════╩═════════════╩═════════════╩══════════╩══════════╩════════╩══════════╩══════════╩═════════╝

COALESCE 也是多余的(如果只允许 positive/zero 值):

SELECT 
 end_user_id,
 tms,
 MAX(CASE WHEN type = 'IN_VEHICLE' THEN confidence ELSE 0 END) AS IN_VEHICLE,
 MAX(CASE WHEN type = 'ON_BICYCLE' THEN confidence ELSE 0 END) AS ON_BICYCLE,
 MAX(CASE WHEN type = 'ON_FOOT'    THEN confidence ELSE 0 END) AS ON_FOOT,
 MAX(CASE WHEN type = 'RUNNING'    THEN confidence ELSE 0 END) AS RUNNING,
 MAX(CASE WHEN type = 'STILL'      THEN confidence ELSE 0 END) AS STILL,
 MAX(CASE WHEN type = 'TILTING'    THEN confidence ELSE 0 END) AS TILTING,
 MAX(CASE WHEN type = 'UNKNOWN'    THEN confidence ELSE 0 END) AS UNKNOWN,
 MAX(CASE WHEN type = 'WALKING'    THEN confidence ELSE 0 END) AS WALKING
FROM activities
GROUP BY end_user_id, tms
ORDER BY end_user_id, tms;

SqlFiddleDemo2

您还可以考虑为 type 列查找 table,例如 activities_type (type_id, type_name),而不是直接存储在 table 字符串 ('IN_VEHICLE', 'ON_BICYCLE', ...).[=39= 中]

附录

我不是 Postgresql 专家,但经过一番尝试:

SELECT 
  LEFT(end_user_id, strpos(end_user_id, '_')-1) AS end_user_id,
  RIGHT(end_user_id, LENGTH(end_user_id) - strpos(end_user_id, '_'))::timestamp AS tms,
  COALESCE(IN_VEHICLE,0) AS IN_VEHICLE, 
  COALESCE(ON_BICYCLE,0) AS ON_BICYCLE, 
  COALESCE(ON_FOOT,0)    AS ON_FOOT, 
  COALESCE(RUNNING,0)    AS RUNNING,  
  COALESCE(STILL,0)      AS STILL,
  COALESCE(TILTING,0)    AS TILTING, 
  COALESCE("UNKNOWN",0)  AS "UNKNOWN", 
  COALESCE(WALKING,0)    AS WALKING 
FROM crosstab(
    'SELECT (end_user_id || ''_'' || tms) AS row_id, type, confidence
    FROM activities
    ORDER BY row_id, type, confidence',
    'SELECT DISTINCT type FROM activities order by type'
    ) AS newtable (
            end_user_id text, 
            IN_VEHICLE  int,
            ON_BICYCLE  int,
            ON_FOOT     int,
            RUNNING     int,
            STILL       int,
            TILTING     int,
            "UNKNOWN"   int,
            WALKING     int)  
ORDER BY end_user_id, tms;  

为什么要连接和拆分 end_user_id + tms

因为crosstab(text,text)需要:

row_id     <=> end_user_id + tms
category   <=> type
value      <=> confidence

请注意,此版本中没有GROUP BY

附录 2 - 最终版本

基于tablefunc module doc F.37.1.4. crosstab(text, text):

这要好得多,因为它可以处理 row_id, extra_col1, extra_col2, category, value)。所以现在:

row_id      <=> id
extra_col1  <=> end_user_id
extra_col2  <=> tms
... 

最终查询:

SELECT 
    end_user_id, 
    tms,
    coalesce(max(IN_VEHICLE), 0) as IN_VEHICLE, 
    coalesce(max(ON_BICYCLE), 0) as ON_BICYCLE, 
    coalesce(max(ON_FOOT),    0) as ON_FOOT, 
    coalesce(max(RUNNING),    0) as RUNNING, 
    coalesce(max(STILL),      0) as STILL, 
    coalesce(max(TILTING),    0) as TILTING, 
    coalesce(max("UNKNOWN"),  0) as "UNKNOWN", 
    coalesce(max(WALKING),    0) as WALKING 
FROM crosstab(
'SELECT id,end_user_id , tms,  type, confidence
FROM activities',
'SELECT DISTINCT type FROM activities order by type'
) AS newtable (
        id INT,
        end_user_id text, 
        tms         timestamp,
        IN_VEHICLE  int,
        ON_BICYCLE  int,
        ON_FOOT     int,
        RUNNING     int,
        STILL       int,
        TILTING     int,
        "UNKNOWN"   int,
        WALKING     int
    )  
GROUP BY end_user_id, tms    
ORDER BY end_user_id, tms;

What would be the point of the activities_type table ?

数据库规范化,你可以使用:

SELECT DISTINCT type FROM activities order by type
vs
SELECT type_name FROM activities_types ORDER BY type_name;

这个版本使用 id 作为 row_id 所以它仍然需要 GROUP BY 来压缩多行。

总结一下:条件聚合是最具可读性的解决方案。