将数据流式传输到 gzip 文件会因 "JavaScript heap out of memory" (NodeJS) 而崩溃

Streaming data to gzip file crashes with "JavaScript heap out of memory" (NodeJS)

我有这个小脚本可以将一堆文本数据以 gzip 的形式从源转储到磁盘。我从工作中提取的大多数资源都没有问题,但我遇到了一个正在抛出 JavaScript heap out of memory.

这是它正在做的事情的片段

const fs = require('fs');
const zlib = require('zlib');

const file = fs.createWriteStream('file.gz');
const gzip = zlib.createGzip();
gzip.pipe(file);

// ... code to connect to someDataSource would be here

someDataSource.on('data', (line) => { // feeding lines of text
    gzip.write(line);
});

someDataSource.on('done', () => {
    // crashes before this point
    gzip.end();
});

我怀疑 zlib 模块在刷新到磁盘之前缓冲的方式超过了应有的方式。崩溃时 gz 文件只有大约 4MB 大。就像我上面说的,我从工作中提取的其他数据源,所有这些都产生了超过 50MB.

的 gz 文件

模块上的文档在这里:https://nodejs.org/api/zlib.html#zlib_class_options

我不确定如何调整选项以使其正常运行。

崩溃:

<--- Last few GCs --->

[33692:0x10264e000]    97556 ms: Scavenge 1370.6 (1411.7) -> 1363.3 (1412.2) MB, 4.5 / 0.0 ms  (average mu = 0.174, current mu = 0.137) allocation failure 
[33692:0x10264e000]    97569 ms: Scavenge 1371.0 (1412.2) -> 1363.7 (1413.7) MB, 4.5 / 0.0 ms  (average mu = 0.174, current mu = 0.137) allocation failure 
[33692:0x10264e000]    97582 ms: Scavenge 1371.3 (1413.7) -> 1364.0 (1430.2) MB, 4.5 / 0.0 ms  (average mu = 0.174, current mu = 0.137) allocation failure 


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0xdd88f3dbe3d]
Security context: 0x32b80cc1e6e9 <JSObject>
    1: /* anonymous */(aka /* anonymous */) [0x32b897904941] [/some/path/node_modules/tedious/lib/token/stream-parser.js:~154] [pc=0xdd88f6fbec4](this=0x32b8101826f1 <undefined>)
    2: valueParse(aka valueParse) [0x32b8c73a8ab9] [/some/path/node_modules/tedious/lib/value-parser.js:~74] [pc=0xdd88f6c96d3](this=0x32b8101826f1 ...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x10003c597 node::Abort() [/usr/local/bin/node]
 2: 0x10003c7a1 node::OnFatalError(char const*, char const*) [/usr/local/bin/node]
 3: 0x1001ad575 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 4: 0x100579242 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/usr/local/bin/node]
 5: 0x10057bd15 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [/usr/local/bin/node]
 6: 0x100577bbf v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/usr/local/bin/node]
 7: 0x100575d94 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
 8: 0x100574998 v8::internal::Heap::HandleGCRequest() [/usr/local/bin/node]
 9: 0x10052a1c8 v8::internal::StackGuard::HandleInterrupts() [/usr/local/bin/node]
10: 0x1007d9bb1 v8::internal::Runtime_StackGuard(int, v8::internal::Object**, v8::internal::Isolate*) [/usr/local/bin/node]
11: 0xdd88f3dbe3d 
12: 0xdd88f6fbec4 
13: 0xdd88f6c96d3 
14: 0xdd88f6c8870 
[1]    33692 abort      node app.js

添加一个 drain 事件监听器。因为将数据写入 bootstrap 是一种同步行为。

someDataSource.on('data', (line) => { // feeding lines of text
    const ok = gzip.write(line);
    if(!ok) {
        someDataSource.pause();
    }
});
gzip.on('drain', () => {
    someDataSource.resume();
});

someDataSource.on('done', () => {
    // crashes before this point
    gzip.end();
});

或者直接使用pipe方法。

someDataSource.pipe(gzip).pipe(file);

您也可以尝试增加分配给Node.js的内存:

node --max-old-space-size=8192 your_script.js