子进程以 SIGTERM 退出（可能是由于超过 maxBuffer）；我如何确认这是一个缓冲区问题并修复它？

Question

问题

我的子进程在大约 30 分钟后退出，SIGTERM 并且没有其他调试输出。鉴于有关 Node.js child process exits with SIGTERM 的信息，我认为该进程很可能由于超过其 maxBuffer 而退出，因为正常运行时间是不确定的，并且确实通过增加 maxBuffer 得到改善。使用默认的 205 KB maxBuffer，它持续运行 1-3 分钟；在 10 MB 的情况下，它始终运行 30-60 分钟。

瞄准

子进程正在以大约每 10 分钟 1 MB（每秒 1.66 KB）的平均速率生成文本流。

文本流中的日志条目是多行的（参见下面的组成一个日志条目的行示例），因此我使用 Node 逐行解析它们以提取感兴趣的信息（从 * << Request >> 到 - End ）：

*   << Request  >> 113214123 
-   Begin          req 113214077 rxreq
-   ReqMethod      GET
-   ReqURL         /ping
-   RespStatus     200
-   End

代码

const { exec } = require('child_process');
const { createInterface } = require('readline');

const cp = exec("tail -F 2021-02-25.log", { maxBuffer: 10000000 });

createInterface(cp.stdout, cp.stdin)
.on('line', line => {
    // ...
    // (Implementation not shown, as it's hundreds of lines long):
    // Add the line to our line-buffer, and if we've reached "-   End   " yet, parse
    // those lines into a corresponding JS object and clear the line-buffer, ready
    // to receive another line.
    // ...
});

cp.on('close', (code, signal) => {
    console.error(`Child process exiting unexpectedly. Code: ${code}; signal: ${signal}.`);
    process.exit(1);
});

问题

本质上，“我怎样才能避免获得 SIGTERM”——但更具体地说：

子进程超出缓冲区，如何确认SIGTERM真的收到了？例如，有没有办法检查子进程在运行时的缓冲区使用情况？
缓冲区是否可能由于 Node 执行行解析函数的时间过长而过载？有没有办法监控这个？
我是否缺少我需要做的额外方面，例如手动刷新一些缓冲区？

我认为在问题上投入额外的缓冲是错误的解决方法； 10 MB 似乎已经太多了，我需要能够保证无限期的正常运行时间（而不是每次失败时都增加一点缓冲区）。

Answer 1

如何诊断 child 进程因超出其缓冲区而退出

我在 Node.js 代码库的测试中搜索了 maxBuffer 并找到了一个显示 how to diagnose child 进程由于超出其分配的 [=13] 而退出的测试=]，我将在此处复制：

// One of the tests from the Node.js codebase:
{
  const cmd =
    `"${process.execPath}" -e "console.log('a'.repeat(1024 * 1024))"`;

  cp.exec(cmd, common.mustCall((err) => {
    assert(err instanceof RangeError);
    assert.strictEqual(err.message, 'stdout maxBuffer length exceeded');
    assert.strictEqual(err.code, 'ERR_CHILD_PROCESS_STDIO_MAXBUFFER');
  }));
}

所以我在我的应用程序中加入了等效的诊断功能：

const { exec } = require('child_process');
const { createInterface } = require('readline');

/**
 * This termination callback is distinct to listening for the "error" event
 * (which does not fire at all, in the case of buffer overflow).
 * @see https://nodejs.org/api/child_process.html#child_process_event_error
 * @see https://nodejs.org/api/child_process.html#child_process_child_process_exec_command_options_callback
 * @param {import("child_process").ExecException | null} error 
 * @param {string} stdout
 * @param {string} stderr 
 * @type {import("child_process").SpawnOptions}
 */
function terminationCallback(error, stdout, stderr){
    if(error === null){
        // Healthy termination. We'll get an exit code and signal from
        // the "close" event handler instead, so will just defer to those
        // logs for debug.
        return;
    }
    console.log(
        `[error] Child process got error with code ${error.code}.` + 
        ` instanceof RangeError: ${error instanceof RangeError}.` +
        ` Error message was: ${error.message}`
    );
    console.log(`stderr (length ${stderr.length}):\n${stderr}`);
    console.log(`stdout (length ${stdout.length}):\n${stdout}`);
}

const cp = exec(
    "tail -F 2021-02-25.log",
    { maxBuffer: 10000000 },
    terminationCallback
);

createInterface(cp.stdout, cp.stdin)
.on('line', line => {
    // ...
    // Implementation not shown
    // ...
});

cp.on('close', (code, signal) => {
    console.error(
        `Child process exiting unexpectedly. ` + 
        `Code: ${code}; signal: ${signal}.`
    );
    process.exit(1);
});

当我运行我的应用程序几分钟后，确实，我发现调用了这个终止回调，并且它满足 Node.js 测试中预期的所有断言 child 由于超出其缓冲区而退出的进程。

我还注意到在终止回调中返回的 stdout 恰好有 1000000 个字符长——这与我设置为 maxBuffer 的字节数完全匹配。正是在这一点上，我开始理解require("child_process").exec()和require("child_process").spawn()之间的区别。

如何创建一个可以安全地从 `stdout`

流式传输任意数量数据的 child 进程

exec() and spawn() have overlapping functionality, but ultimately are suited for different purposes, which is not really spelled out in the Child Process documentation。线索就在他们接受的构造参数中。

exec() 接受终止回调，其选项支持 maxBuffer（但不支持 stdio）。
spawn() 不接受终止回调，其选项支持 stdio（但不支持 maxBuffer）。

这里的标题是：

exec() 适合有明确终点的任务（你会收获 stdout/stderr child 进程一直在其缓冲区中累积。
spawn() 适用于可能运行无限期 的任务，因为您可以配置 stdout/stderr /stdin 流通过管道传输到。 options.stdio、"pipe" 的默认配置将它们通过管道传输到 parent 进程（您的 Node.js 应用程序），在我们需要建立 readline 接口并使用 stdout line-by-line。除了操作系统本身施加的缓冲区限制外，没有明确的缓冲区限制（应该相当慷慨！）。

因此，如果您正在编写一个 Node.js 应用程序来管理一个 child 进程，该进程的任务是运行无限期：

查看日志（例如 tail -F 2021-02-25.log）non-stop 并解析它们
运行启用 always-on 直播服务（例如 ffmpeg <some complex args here>）

...你应该使用 spawn()!

相反，对于有明确结束和可预测的合理缓冲区大小的任务（例如 mkdir -vp some/dir/path 或 rsync --verbose <src> <dest>），那么您可以继续使用 exec()！

两者之间当然可能还有其他区别，但流处理的这一方面确实很有影响。

如何使用`spawn()`

重写

只有两行需要更改（其中一行只是导入语句）！请注意，"pipe" 的默认 options.stdio 值在这里是合适的，因此我们甚至不需要传入选项 object.

const { spawn } = require('child_process');
const { createInterface } = require('readline');

const cp = spawn("tail", ["-F", "2021-02-25.log"]);

createInterface(cp.stdout, cp.stdin)
.on('line', line => {
    // ...
    // (Implementation not shown, as it's hundreds of lines long):
    // Add the line to our line-buffer, and if we've reached "-   End   " yet, parse
    // those lines into a corresponding JS object and clear the line-buffer, ready
    // to receive another line.
    // ...
});

cp.on('close', (code, signal) => {
    console.error(`Child process exiting unexpectedly. Code: ${code}; signal: ${signal}.`);
    process.exit(1);
});

子进程以 SIGTERM 退出（可能是由于超过 maxBuffer）；我如何确认这是一个缓冲区问题并修复它？

Child process exiting with SIGTERM (possibly due to exceeding maxBuffer); how can I confirm that it's a buffer problem and fix it?

node.js

child-process

问题

瞄准

代码

问题

如何诊断 child 进程因超出其缓冲区而退出

如何创建一个可以安全地从 `stdout`

如何使用`spawn()`

子进程以 SIGTERM 退出（可能是由于超过 maxBuffer）；我如何确认这是一个缓冲区问题并修复它？

Child process exiting with SIGTERM (possibly due to exceeding maxBuffer); how can I confirm that it's a buffer problem and fix it?

node.js

child-process

问题

瞄准

代码

问题

如何诊断 child 进程因超出其缓冲区而退出

如何创建一个可以安全地从 stdout

如何使用spawn()

如何创建一个可以安全地从 `stdout`

如何使用`spawn()`