节点调用带有临时表的 postgres 函数导致 "memory leak"

Question

我有一个 node.js 程序调用 Postgres（Amazon RDS 微实例）函数，get_jobs 在事务中，使用 brianc 的 node-postgres 包每秒 18 次。

节点代码只是brianc's basic client pooling example的加强版，大致像...

var pg = require('pg');
var conString = "postgres://username:password@server/database";

function getJobs(cb) {
  pg.connect(conString, function(err, client, done) {
    if (err) return console.error('error fetching client from pool', err);
    client.query("BEGIN;");
    client.query('select * from get_jobs()', [], function(err, result) {
      client.query("COMMIT;");
      done(); //call `done()` to release the client back to the pool
      if (err) console.error('error running query', err);
      cb(err, result);
    });
  });
}

function poll() {
  getJobs(function(jobs) {
    // process the jobs
  });
  setTimeout(poll, 55);
}

poll(); // start polling

所以 Postgres 得到：

2016-04-20 12:04:33 UTC:172.31.9.180(38446):XXX@XXX:[5778]:LOG:  statement: BEGIN;
2016-04-20 12:04:33 UTC:172.31.9.180(38446):XXX@XXX:[5778]:LOG:  execute <unnamed>: select * from get_jobs();
2016-04-20 12:04:33 UTC:172.31.9.180(38446):XXX@XXX:[5778]:LOG:  statement: COMMIT;

...每 55 毫秒重复一次。

get_jobs是用temp写的tables，像这样

CREATE OR REPLACE FUNCTION get_jobs (
) RETURNS TABLE (
  ...
) AS 
$BODY$
DECLARE 
  _nowstamp bigint; 
BEGIN

  -- take the current unix server time in ms
  _nowstamp := (select extract(epoch from now()) * 1000)::bigint;  

  --  1. get the jobs that are due
  CREATE TEMP TABLE jobs ON COMMIT DROP AS
  select ...
  from really_big_table_1 
  where job_time < _nowstamp;

  --  2. get other stuff attached to those jobs
  CREATE TEMP TABLE jobs_extra ON COMMIT DROP AS
  select ...
  from really_big_table_2 r
    inner join jobs j on r.id = j.some_id

  ALTER TABLE jobs_extra ADD PRIMARY KEY (id);

  -- 3. return the final result with a join to a third big table
  RETURN query (

    select je.id, ...
    from jobs_extra je
      left join really_big_table_3 r on je.id = r.id
    group by je.id

  );

END
$BODY$ LANGUAGE plpgsql VOLATILE;

我使用了 the temp table pattern，因为我知道 jobs 总是从 really_big_table_1 中提取的一小部分行，希望这比单个查询具有更好的扩展性多个连接和多个 where 条件。（我在 SQL 服务器上使用它效果很好，我现在不信任任何查询优化器，但请告诉我这是否是 Postgres 的错误方法！）

查询运行s 在 8 毫秒内 tables（从节点测量），在下一个作业开始之前有足够的时间完成一项作业 "poll"。

问题：以这种速度轮询大约 3 小时后，Postgres 服务器运行内存不足并崩溃。

我已经试过了...

如果我在没有 temp tables 的情况下重写函数，Postgres 不会运行内存不足，但我使用 temp table 模式很多，所以这不是解决方案。
如果我停止节点程序（这会终止它用于运行查询的 10 个连接），内存就会释放出来。仅仅让节点在轮询会话之间等待一分钟不会产生相同的效果，因此与池连接关联的 Postgres 后端显然保留了一些资源。
如果我在轮询进行时运行 a VACUUM，它对内存消耗没有影响，服务器继续走向死亡。
降低轮询频率只会改变服务器死机前的时间量。
在每个COMMIT;后添加DISCARD ALL;没有效果。
在 RETURN query () 之后显式调用 DROP TABLE jobs; DROP TABLE jobs_extra;，而不是在 CREATE TABLE 上调用 ON COMMIT DROP。服务器仍然崩溃。
根据 CFrei 的建议，将 pg.defaults.poolSize = 0 添加到节点代码以尝试禁用池化。服务器仍然崩溃，但花费了更长的时间并且交换比之前的所有测试都高得多（第二个峰值），看起来像下面的第一个峰值。后来才知道pg.defaults.poolSize = 0 may not disable pooling as expected.

在this的基础上："Temporary tables cannot be accessed by autovacuum. Therefore, appropriate vacuum and analyze operations should be performed via session SQL commands."，我尝试从节点服务器运行一个VACUUM（作为一些尝试使VACUUM 一个 "in session" 命令）。我实际上无法让这个测试工作。我的数据库中有很多对象，VACUUM，对所有对象进行操作，执行每个作业迭代花费的时间太长。将 VACUUM 限制为临时 tables 是不可能的 - (a) 你不能运行 VACUUM 在交易中和 (b) 在交易之外临时 table不存在。 :P 编辑：稍后在 Postgres IRC 论坛上，一个乐于助人的小伙子解释说 VACUUM 与临时 tables 本身无关，但对于清理从 pg_attributes 中创建和删除的行很有用TEMP TABLES 原因。无论如何，VACUUMing "in session" 不是答案。
DROP TABLE ... IF EXISTS 在 CREATE TABLE 之前，而不是 ON COMMIT DROP。服务器还是挂了。
CREATE TEMP TABLE (...) 和 insert into ... (select...) 而不是 CREATE TEMP TABLE ... AS，而不是 ON COMMIT DROP。服务器死机了。

那么ON COMMIT DROP是不是释放了所有的关联资源？还有什么可以保持记忆？如何释放它？

Answer 1

I used this to great effect with SQL Server and I don't trust any query optimiser now

那就别用了。您仍然可以直接执行查询，如下所示。

but please tell me if this is the wrong approach for Postgres!

这不是一个完全错误的方法，它只是一个非常笨拙的方法，因为您正在尝试创建其他人已经实现的东西以便更容易使用。结果，您犯了很多错误，这些错误可能会导致很多问题，包括内存泄漏。

与使用 pg-promise 的完全相同示例的简单性比较：

var pgp = require('pg-promise')();
var conString = "postgres://username:password@server/database";
var db = pgp(conString);

function getJobs() {
    return db.tx(function (t) {
        return t.func('get_jobs');
    });
}

function poll() {
    getJobs()
        .then(function (jobs) {
            // process the jobs
        })
        .catch(function (error) {
            // error
        });

    setTimeout(poll, 55);
}

poll(); // start polling

使用 ES6 语法时变得更简单：

var pgp = require('pg-promise')();
var conString = "postgres://username:password@server/database";
var db = pgp(conString);

function poll() {
    db.tx(t=>t.func('get_jobs'))
        .then(jobs=> {
            // process the jobs
        })
        .catch(error=> {
            // error
        });

    setTimeout(poll, 55);
}

poll(); // start polling

在你的示例中我唯一不太理解的是使用事务来执行单个 SELECT。这不是事务通常的用途，因为您没有更改任何数据。我假设您正在尝试缩小您拥有的一段真实代码，该代码也会更改一些数据。

如果您不需要交易，您的代码可以进一步简化为：

var pgp = require('pg-promise')();
var conString = "postgres://username:password@server/database";
var db = pgp(conString);

function poll() {
    db.func('get_jobs')
        .then(jobs=> {
            // process the jobs
        })
        .catch(error=> {
            // error
        });

    setTimeout(poll, 55);
}

poll(); // start polling

更新

但是，不控制上一个请求的结束是一种危险的方法，这也可能会产生 memory/connection 问题。

安全的方法应该是：

function poll() {
    db.tx(t=>t.func('get_jobs'))
        .then(jobs=> {
            // process the jobs

            setTimeout(poll, 55);
        })
        .catch(error=> {
            // error

            setTimeout(poll, 55);
        });
}

Answer 2

使用 CTE 创建部分结果集而不是临时表。

CREATE OR REPLACE FUNCTION get_jobs (
) RETURNS TABLE (
  ...
) AS 
$BODY$
DECLARE 
  _nowstamp bigint; 
BEGIN

  -- take the current unix server time in ms
  _nowstamp := (select extract(epoch from now()) * 1000)::bigint;  

  RETURN query (

    --  1. get the jobs that are due
    WITH jobs AS (

      select ...
      from really_big_table_1 
      where job_time < _nowstamp;

    --  2. get other stuff attached to those jobs
    ), jobs_extra AS (

      select ...
      from really_big_table_2 r
        inner join jobs j on r.id = j.some_id

    ) 

    -- 3. return the final result with a join to a third big table
    select je.id, ...
    from jobs_extra je
      left join really_big_table_3 r on je.id = r.id
    group by je.id

  );

END
$BODY$ LANGUAGE plpgsql VOLATILE;

规划器将按照我希望使用临时表实现的方式按顺序评估每个块。

我知道这并不能直接解决内存泄漏问题（我很确定 Postgres 对它们的实现有问题，至少它们在 RDS 配置上的表现方式是这样）。

但是，查询有效，它是按我预期的方式计划的查询，并且在运行作业 3 天后内存使用现在稳定，而且我的服务器没有崩溃。

我根本没有更改节点代码。

节点调用带有临时表的 postgres 函数导致 "memory leak"

Node calling postgres function with temp tables causing "memory leak"

postgresql

node.js

node-postgres