使用 Node.js/Sequelize 执行批量插入时 PostgreSQL 崩溃

PostgreSQL Crashes When Doing Bulk Inserts with Node.js/Sequelize

使用 Sequelize.js ORM 的 Node.js 应用程序正在 Mac [=] 上的 Docker 容器内对 PostgreSQL 11.2 服务器 运行 执行批量插入43=] 主机系统。每个批量插入通常包含大约 1000-4000 行,批量插入并发为 30,因此任何时候最多有 30 个活动插入操作。

const bulkInsert = async (payload) => {
    try {
        await sequelizeModelInstance.bulkCreate(payload);
    } catch (e) {
        console.log(e);
    }
}

pLimit = require('p-limit')(30);

(function() => {
    const promises = data.map(d => pLimit(() => bulkInsert(d))) // pLimit() controls Promise concurrency
    const result = await Promise.all(promises)
})();

一段时间后,PostgreSQL 服务器将开始出现错误 Connection terminated unexpectedly,然后是 the database system is in recovery mode

重复几次并检查我的日志后,似乎这个错误通常发生在执行一批 30 个批量插入时,其中多个批量插入每个包含超过 100,000 行。例如,当尝试进行 3 次 190000、650000 和 150000 行的批量插入以及 27 次每次 1000-4000 行的插入时,会发生一次特定的崩溃。

系统内存未满,CPU负载正常,有足够的磁盘space可用。

问题: PostgreSQL 在这种情况下崩溃是否正常?如果是这样,是否有我们可以调整的 PostgreSQL 设置以允许更大的批量插入?如果这是因为大批量插入,Sequelize.js是否有为我们拆分批量插入的功能?

运行 在 docker 容器中的 PostgreSQL 11.2、TimescaleDB 1.5.1、节点 v12.6.0、sequelize 5.21.3、Mac Catalina 10.15.2

PostgreSQL 在问题发生后立即记录

2020-01-18 00:58:26.094 UTC [1] LOG:  server process (PID 199) was terminated by signal 9
2020-01-18 00:58:26.094 UTC [1] DETAIL:  Failed process was running: INSERT INTO "foo" ("id","opId","unix","side","price","amount","b","s","serverTimestamp") VALUES (89880,'5007564','1579219200961','front','0.0000784','35','undefined','undefined','2020-01-17 00:00:01.038 +00:00'),.........
2020-01-18 00:58:26.108 UTC [1] LOG:  terminating any other active server processes
2020-01-18 00:58:26.110 UTC [220] WARNING:  terminating connection because of crash of another server process
2020-01-18 00:58:26.110 UTC [220] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-01-18 00:58:26.110 UTC [220] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-01-18 00:58:26.148 UTC [214] WARNING:  terminating connection because of crash of another server process
2020-01-18 00:58:26.148 UTC [214] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-01-18 00:58:26.148 UTC [214] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2020-01-18 00:58:26.149 UTC [203] WARNING:  terminating connection because of crash of another server process
2020-01-18 00:58:26.149 UTC [203] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

...

2020-01-18 00:58:30.098 UTC [1] LOG:  all server processes terminated; reinitializing
2020-01-18 00:58:30.240 UTC [223] FATAL:  the database system is in recovery mode
2020-01-18 00:58:30.241 UTC [222] LOG:  database system was interrupted; last known up at 2020-01-18 00:50:13 UTC
2020-01-18 00:58:30.864 UTC [224] FATAL:  the database system is in recovery mode
2020-01-18 00:58:31.604 UTC [225] FATAL:  the database system is in recovery mode
2020-01-18 00:58:32.297 UTC [226] FATAL:  the database system is in recovery mode
2020-01-18 00:58:32.894 UTC [227] FATAL:  the database system is in recovery mode
2020-01-18 00:58:33.394 UTC [228] FATAL:  the database system is in recovery mode
2020-01-18 01:00:55.911 UTC [222] LOG:  database system was not properly shut down; automatic recovery in progress
2020-01-18 01:00:56.856 UTC [222] LOG:  redo starts at 0/197C610
2020-01-18 01:01:55.662 UTC [229] FATAL:  the database system is in recovery mode

您的 Postgresql 服务器可能被来自 Docker OS.

的 OOM Killer(Out of Memory Killer)杀死

您可以:

  1. 增加 Postgres 可用的内存,2GB 对于您正在 运行ning 的操作量来说是一个较低的值。
  2. 减小批量插入大小并限制它们的并发性。
  3. 调整您的 Postgres 安装以适应您的硬件:
    • shared_buffers:因为 recommended here 应该是系统内存的 25%,这是一个建议,您应该始终对您的场景进行基准测试和评估,并选择适合的值你的环境。
    • work_mem:正如所解释的 here

This size is applied to each and every sort done by each user, and complex queries can use multiple working memory sort buffers. Set it to 50MB, and have 30 users submitting queries, and you are soon using 1.5GB of real memory. Furthermore, if a query involves doing merge sorts of 8 tables, that requires 8 times work_mem. You need to consider what you set max_connections to in order to size this parameter correctly. This is a setting where data warehouse systems, where users are submitting very large queries, can readily make use of many gigabytes of memory.

您可以在 Postgres 上更改许多配置以提高性能和稳定性。 Postgres 将 运行 使用默认设置就可以了,但在生产环境或重负载环境中,您将不可避免地需要对其进行调整。

推荐读物:

  1. Tuning Your PostgreSQL System
  2. PostgreSQL Documentation: Resource Consumption
  3. Configuring Memory on PostgreSQL

我在 运行 迁移时遇到了类似的问题,但解决方案可以应用于此问题。

这个想法是将您的有效负载拼接成可管理的块。就我而言,一次 100 条记录似乎是可以管理的。

const payload = require("./seeds/big-mama.json"); //around 715.000 records

module.exports = {
    up: (queryInterface) => {
        const records = payload.map(function (record) {
            record.createdAt = new Date();
            record.updatedAt = new Date();
            return record;
        });

        let lastQuery;
        while (records.length > 0) {
            lastQuery = queryInterface.bulkInsert(
                "Products",
                records.splice(0, 100),
                {}
            );
        }

        return lastQuery;
    },

    down: (queryInterface) => {
        return queryInterface.bulkDelete("Products", null, {});
    }
};