将两列合并为一列

Question

对于 table 大小为 7m 的条目，将两列合并为一列时我遇到了性能问题。

我的主要目标是：

将 open_time 和 close_time 合并到一个 'time' 列中
订购时间和 account_id DESC
根据按时间排序的条目和 account_id 检查每个 account_id 的 op_type 列：
- 如果第一个和第二个条目的 op_type 从 0 变为 1，反之亦然，使计数器 +1。
- 如果 op_type 对于第二个和第三个条目再次从 0 更改为 1，反之亦然，再次使计数器 +1。
- 依此类推每个 account_id

account_id、open_time 和 close_time 已编入索引。

选项 #1： 使用两个 select 的并集将 open_time 和 close_time 列合并为一列 'time'陈述：

select account_id, op_type, open_time as time, instrument_id
from tmp_operations
UNION
select account_id, op_type=0, close_time as time, instrument_id
from tmp_operations

联合二 select 语句的执行时间超过 4 000 000 毫秒，但仍然运行。

选项 #2： 使用数组的非嵌套将 open_time 和 close_time 列合并为一列 'time'：

SELECT
    account_id,
    op_type,
    unnest(ARRAY[open_time, close_time]) as time,
    instrument_id
FROM risklive.operations_mt4 op

unnesting array 的执行时间约为315 000 ms，这更好。感谢加百列的信使！

一些我希望看到的时间戳合并结果示例：

      open_time               close_time                       time
"2015-08-19 09:18:24"    "2015-08-19 09:20:40"          "2015-08-19 09:18:24" 
"2015-08-19 09:11:54"    "2015-08-19 09:17:16"    -->   "2015-08-19 09:20:40"
"2015-08-19 09:17:46"    "2015-08-19 09:18:22"          "2015-08-19 09:11:54"
                                                        "2015-08-19 09:17:16"
                                                        "2015-08-19 09:17:16"
                                                        "2015-08-19 09:17:46"
                                                        "2015-08-19 09:18:22"

至于 op_type 列更改每个条目的计数器 account_id:

account_id   op_type         time
  63004;        1;    "2015-08-19 09:18:24"
  63004;        1;    "2015-08-19 09:20:40"
  63004;        1;    "2015-08-19 09:11:54"
  63004;        1;    "2015-08-19 09:17:16"   <-- op_type will be changed in next entry
  63004;        0;    "2015-08-19 09:17:46"   <-- counter = 1
  63004;        0;    "2015-08-19 09:18:22"   <-- op_type will be changed in next entry
  63004;        1;    "2015-08-19 09:09:31"   <-- counter = 2
  63004;        1;    "2015-08-19 09:09:31"
  63004;        1;    "2015-08-19 09:31:09"
  63004;        1;    "2015-08-19 09:32:07"   <-- op_type will be changed in next entry
  63004;        0;    "2015-08-19 09:32:09"   <-- counter = 3
  63004;        0;    "2015-08-19 09:57:44"   <-- op_type will be changed in next entry
  63004;        1;    "2015-08-19 09:20:43"   <-- counter = 4
  63004;        1;    "2015-08-19 09:31:51"
  63004;        1;    "2015-08-19 09:20:59"
  63004;        1;    "2015-08-19 09:31:51"

以上op_type更改计数器我现在不知道如何实现。

如何调整所有这些？

Answer 1

由于你的问题略有改动，但在我回答之后，我也完全重写了我的post。

免责声明：

您需要执行需要合并的操作，并订购整个 table（700 万行），这每次都是瓶颈。如果不完全改变您的方法，您可能找不到令您满意的解决方案。不过让我试试。

第一个问题：

因此，您的第一个问题是将 "merge" 两列合并为一列，用于整个 table 700 万行。你试试 UNION 需要两个 seq scan。正如我之前提出的那样，解决方案可能是使用数组聚合和 unnest（你做了什么）：

SELECT
    account_id,
    op_type,
    unnest(ARRAY[open_time, close_time]) as time,
    instrument_id
FROM risklive.operations_mt4 op

第二个问题：

正在计算 account_id 的 op_type 变化，同时按 "merge" 时间列排序。为了便于阅读，我使用 CTE 将 "merged table" 放入。

我们必须使用子查询。在一个级别上，我们检查 op_type 以正确的顺序更改（使用 lag() WINDOW FUNCTION 其中 returns 值在当前行之前的一行）。在第二级，我们总结了 op_type 变化的数量。

WITH merged_table AS (
    SELECT
        account_id,
        op_type,
        unnest(ARRAY[open_time, close_time]) as time,
        instrument_id
    FROM risklive.operations_mt4 op
)
SELECT 
    account_id, SUM(abs(x)) as counter
FROM (
    SELECT
         m.account_id,
         (m.op_type - lag(m.op_type)
                 OVER (PARTITION BY m.account_id ORDER BY time)
         ) as zero_if_no_op_type_change
    FROM merged_table m
) sub
GROUP BY account_id

很遗憾，满足您的需求可能需要很长时间。如果是这样，我认为很难做更多的改进。

将两列合并为一列

Merge two columns into one

postgresql

performance

sql-tuning

免责声明：

第一个问题：

第二个问题：