Postgresql 查询运行时慢时快。解释计划显示高共享阅读

Postgresql query runs slow and fast. Explain plan shows high shared read

我有一个高吞吐量 table 每天大约有 2000 万次插入 N_PROCESSING_STATE = 0 一组过程 select 来自此 table 的新记录,将它们分组并插入或更新另一个 table。完成后,处理的记录将更新为 N_PROCESSING_STATE = 1 日常清洁工作会删除这些记录。

我的问题是 select 有时运行得很快,有时运行得很慢。我已经在 20 分钟内多次解释了 运行 同一查询的计划结果,需要帮助理解为什么速度如此不同

举三个例子。 运行 上一篇完结后就差不多了

Limit  (cost=0.56..6541.54 rows=165 width=6155) (actual time=0.088..137855.110 rows=990 loops=1)
  Buffers: shared hit=6026143 read=432018
  ->  Index Scan using ipc_message_print_proc_state on ipc_message_print  (cost=0.56..6541.54 rows=165 width=6155) (actual time=0.086..137854.983 rows=990 loops=1)
        Index Cond: (n_processing_state = 0)
        Filter: (mod((ascii(substr((c_transaction_id)::text, 1, 1)) + ascii("right"((c_transaction_id)::text, 1))), 4) = 3)
        Rows Removed by Filter: 3111
        Buffers: shared hit=6026143 read=432018
Planning Time: 0.499 ms
Execution Time: 137855.332 ms

Limit  (cost=0.56..6546.66 rows=165 width=6155) (actual time=0.063..27.692 rows=3000 loops=1)
  Buffers: shared hit=9232 read=2
  ->  Index Scan using ipc_message_print_proc_state on ipc_message_print  (cost=0.56..6546.66 rows=165 width=6155) (actual time=0.061..27.346 rows=3000 loops=1)
        Index Cond: (n_processing_state = 0)
        Filter: (mod((ascii(substr((c_transaction_id)::text, 1, 1)) + ascii("right"((c_transaction_id)::text, 1))), 4) = 3)
        Rows Removed by Filter: 8869
        Buffers: shared hit=9232 read=2
Planning Time: 0.451 ms
Execution Time: 27.992 ms

Limit  (cost=0.56..11645.97 rows=289 width=6157) (actual time=0.064..141655.565 rows=973 loops=1)
  Buffers: shared hit=6194738 read=444040 written=938
  ->  Index Scan using ipc_message_print_proc_state on ipc_message_print  (cost=0.56..11645.97 rows=289 width=6157) (actual time=0.062..141655.472 rows=973 loops=1)
        Index Cond: (n_processing_state = 0)
        Filter: (mod((ascii(substr((c_transaction_id)::text, 1, 1)) + ascii("right"((c_transaction_id)::text, 1))), 4) = 3)
        Rows Removed by Filter: 3127
        Buffers: shared hit=6194738 read=444040 written=938
Planning Time: 5.542 ms
Execution Time: 141655.720 ms

我可以看到 "fast" 一个只使用缓存数据。 我还看到其他人正在阅读新条目。我不明白的是为什么会有如此巨大的差异。 快速的是检索 3000,因为有一个限制集,我认为这有帮助。问题是为什么其他两个查询需要读取超过 400'000 条记录。为什么在第一次尝试后不缓存它们。那段时间插入的新记录肯定少于 200'000。

如果我需要提供更多内存,我可以看看某个地方吗? (共享缓冲区设置为 24GB)

我的实际查询

explain (analyze,buffers,timing) SELECT K_MESSAGE_PRINT_ID, D_PRINT_TIMESTAMP, C_MESSAGE_ID, C_TRANSACTION_ID, C_MESSAGE_PRINT_TYPE, N_MESSAGE_STATE, B_MESSAGE_ACTIVE
, FK_INFRA_OBJECT_ID, FK_FLOW_STEP_ID, FK_NEXT_FLOW_STEP_ID, FK_MESSAGE_CATEGORY, FK_ACK_USER_PROFILE_ID, FK_SERVICE_FORMAT_ID, FK_MESSAGE_PROFILE_ID
, FK_MESSAGE_TYPE_VERSION_ID, FK_MESSAGE_INSTANCE_ID, FK_WORKFLOW_ID, OS_WORKFLOW_ACTION_ID, C_REF_1, C_REF_2, C_REF_3, C_VISIBILITY_CODE
, FK_VISIBILITY_USER_PROFILE_ID, FK_VISIBILITY_GROUP_ID, FK_VISIBILITY_ORGANISATION_ID, C_TAG_INFO
, CASE WHEN CLOB_MESSAGE IS NULL THEN 'no' ELSE 'yes' END AS HAS_CLOB_MESSAGE, CASE WHEN CLOB_PROPRIETARY_MESSAGE IS NULL THEN 'no' ELSE 'yes' END AS HAS_PROPRIETARY_CLOB_MESSAGE
, CASE WHEN CLOB_MESSAGE_ERRORS IS NULL THEN 'no' ELSE 'yes' END AS HAS_CLOB_MESSAGE_ERRORS, CASE WHEN CLOB_STATUS_MSG IS NULL THEN 'no' ELSE 'yes' END AS HAS_CLOB_STATUS_MESSAGE
, C_STATUS_CODE, C_COMMENT, C_TARGET_INFO, N_PROCESSING_STATE, C_LINK_INFO, C_GATE_NAME,C_MESSAGE_SUB_STATE, C_MESSAGE_TYPE, C_MESSAGE_TYPE_2
, C_ORIGINAL_SENDER, C_FINAL_RECEIVER, C_SENDER, C_RECEIVER, C_MESSAGE_ID_2, C_MESSAGE_REF, D_VALUE_DATE, C_AMOUNT, C_AMOUNT_CURR, C_ORGANISATION
, C_ORGANISATION_2, N_AMOUNT_VALUE,C_ATT_21, C_ATT_23, C_ATT_22, C_ATT_24, C_ATT_7, C_ATT_3, C_ATT_11, C_ATT_25, C_ATT_1, C_ATT_19, C_ATT_4
, C_ATT_5, C_ATT_13, C_ATT_9, C_ATT_2, C_ATT_10, C_ATT_20, C_ATT_18, C_ATT_26, C_ATT_15, C_ATT_12, C_ATT_6, C_ATT_8, C_ATT_14, N_ATT_2, N_ATT_4
, N_ATT_13, N_ATT_14, N_ATT_1, N_ATT_6, N_ATT_12, N_ATT_3, N_ATT_11, D_ATT_3, D_ATT_1, D_ATT_2, D_ATT_4, D_ATT_5, D_ATT_6 
FROM IPC_MESSAGE_PRINT 
WHERE N_PROCESSING_STATE = 3 
AND MOD(ASCII(SUBSTR(C_TRANSACTION_ID,1,1)) + ASCII(RIGHT(C_TRANSACTION_ID, 1)),4) = 0  
limit 3000

我每 15 分钟对这个 table 进行一次真空分析,这让情况变得更好了。 我还 运行 一个 vacuum full 以验证没有膨胀问题并且索引正常。

你们有长期未平仓的交易吗?

PostgreSQL 不在索引中存储可见性信息。所以没有找到 3000 个条目的查询必须 运行 遍历 n_processing_state = 0 的整个索引部分,然后转到 table 行,才发现元组是没有的不再可见(已更新,n_processing_state 不再为 0,或者已被删除)。这很慢。如果它发现该元组不再对自身或任何其他现有事务可见,那么它将在索引中将其标记为无效,以便下一个查询不必重复此操作。但是,如果有任何其他事务可能想要查看该元组,则它不能在索引中将其标记为死。因此,一个被遗忘的事务可能会导致其他人一遍又一遍地继续访问相同的过时行。

The question is why there is why over 400'000 records need to be read for the other two queries.

那是 400,000 页,不是记录。 EXPLAIN 中没有指示访问了多少行但发现不可见。估计40万多了吧

Once done the processed records are updated with N_PROCESSING_STATE = 1 A daily housekeeping job deletes these records.

你有什么理由不能立即删除它们吗? UPDATE 紧接着 DELETE 将创建更多需要处理的死行。