读取大型 JSON 文件的最佳方式

Best way to read a large JSON file

我目前有一个 700M 的文件,当我尝试读取它时总是遇到内存限制(目的:使用 firestore nodejs sdk 将数据导入 FireStore)。

我尝试了以下库:


  return fs.createReadStream(file)
    .pipe(parser())
    .pipe(streamArray())
    .on('data', async (row) => {
    //   delete row.key;
      if(row.value && typeof row.value === 'object') {
        ++totalSetCount;

      }
    })
    .on('end', async () => {
      // Final Batch commit and completion message.
      // await batchCommit(false);
      console.log(args.dryRun
        ? 'Dry-Run complete, Firestore was not updated.'
        : 'Import success, Firestore updated!'
      );
      console.log(`Total documents written: ${totalSetCount}`);
    });
}

这是我的错误:

<--- Last few GCs --->

[63298:0x102682000]    66318 ms: Mark-sweep 1365.8 (1441.3) -> 1353.1 (1441.8) MB, 470.6 / 0.0 ms  (average mu = 0.212, current mu = 0.069) allocation failure scavenge might not succeed
[63298:0x102682000]    66796 ms: Mark-sweep 1366.4 (1442.3) -> 1352.1 (1443.3) MB, 446.4 / 0.0 ms  (average mu = 0.152, current mu = 0.065) allocation failure scavenge might not succeed


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0xd54cf6dbe3d]
Security context: 0x364a2419e6e1 <JSObject>
    1: exec [0x364a24189231](this=0x364a321029a1 <JSRegExp <String[50]: [^\"\]{1,256}|\[bfnrt\"\\/]|\u[\da-fA-F]{4}|\">>,0x364aa7402201 <Very long string[65536]>)
    2: _processInput [0x364a32102a09] [/Users/mac-clement/Documents/projets/dpas/gcp/import-data/json-import/node_modules/stream-json/Parser.js:~107] [pc=0xd54cf9bb37b](this=0x364ac032ea19 <Tran...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x10003b125 node::Abort() [/usr/local/bin/node]
 2: 0x10003b32f node::OnFatalError(char const*, char const*) [/usr/local/bin/node]
 3: 0x1001a8e85 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 4: 0x1005742a2 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/usr/local/bin/node]
 5: 0x100576d75 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [/usr/local/bin/node]
 6: 0x100572c1f v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/usr/local/bin/node]
 7: 0x100570df4 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
 8: 0x10057d68c v8::internal::Heap::AllocateRawWithLigthRetry(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/usr/local/bin/node]
 9: 0x10057d70f v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/usr/local/bin/node]
10: 0x10054d054 v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) [/usr/local/bin/node]
11: 0x1007d4f24 v8::internal::Runtime_AllocateInNewSpace(int, v8::internal::Object**, v8::internal::Isolate*) [/usr/local/bin/node]
12: 0xd54cf6dbe3d
[1]    63298 abort      firestore-migrator i /Users/mac-clement/Downloads/wetransfer-ff44eb/5000.json

如果您有任何建议,我将不胜感激。

似乎向您的 on data 事件处理程序添加 return null; 可以修复它。您的图书馆可能正在积累未解决的承诺。

您可能应该使用 SAX 策略并逐个读取文件。 DOM 策略意味着您将整个 JSON 文件解码为树结构。当你使用 SAX 策略时,你有一个事件来获取每个分离的值,它是用它做任何事情的关键。