读取大型 JSON 文件的最佳方式
Best way to read a large JSON file
我目前有一个 700M 的文件,当我尝试读取它时总是遇到内存限制(目的:使用 firestore nodejs sdk 将数据导入 FireStore)。
我尝试了以下库:
- json-流 (https://github.com/uhop/stream-json)
- JSONStream (https://github.com/dominictarr/JSONStream)
return fs.createReadStream(file)
.pipe(parser())
.pipe(streamArray())
.on('data', async (row) => {
// delete row.key;
if(row.value && typeof row.value === 'object') {
++totalSetCount;
}
})
.on('end', async () => {
// Final Batch commit and completion message.
// await batchCommit(false);
console.log(args.dryRun
? 'Dry-Run complete, Firestore was not updated.'
: 'Import success, Firestore updated!'
);
console.log(`Total documents written: ${totalSetCount}`);
});
}
这是我的错误:
<--- Last few GCs --->
[63298:0x102682000] 66318 ms: Mark-sweep 1365.8 (1441.3) -> 1353.1 (1441.8) MB, 470.6 / 0.0 ms (average mu = 0.212, current mu = 0.069) allocation failure scavenge might not succeed
[63298:0x102682000] 66796 ms: Mark-sweep 1366.4 (1442.3) -> 1352.1 (1443.3) MB, 446.4 / 0.0 ms (average mu = 0.152, current mu = 0.065) allocation failure scavenge might not succeed
<--- JS stacktrace --->
==== JS stack trace =========================================
0: ExitFrame [pc: 0xd54cf6dbe3d]
Security context: 0x364a2419e6e1 <JSObject>
1: exec [0x364a24189231](this=0x364a321029a1 <JSRegExp <String[50]: [^\"\]{1,256}|\[bfnrt\"\\/]|\u[\da-fA-F]{4}|\">>,0x364aa7402201 <Very long string[65536]>)
2: _processInput [0x364a32102a09] [/Users/mac-clement/Documents/projets/dpas/gcp/import-data/json-import/node_modules/stream-json/Parser.js:~107] [pc=0xd54cf9bb37b](this=0x364ac032ea19 <Tran...
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
1: 0x10003b125 node::Abort() [/usr/local/bin/node]
2: 0x10003b32f node::OnFatalError(char const*, char const*) [/usr/local/bin/node]
3: 0x1001a8e85 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
4: 0x1005742a2 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/usr/local/bin/node]
5: 0x100576d75 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [/usr/local/bin/node]
6: 0x100572c1f v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/usr/local/bin/node]
7: 0x100570df4 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
8: 0x10057d68c v8::internal::Heap::AllocateRawWithLigthRetry(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/usr/local/bin/node]
9: 0x10057d70f v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/usr/local/bin/node]
10: 0x10054d054 v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) [/usr/local/bin/node]
11: 0x1007d4f24 v8::internal::Runtime_AllocateInNewSpace(int, v8::internal::Object**, v8::internal::Isolate*) [/usr/local/bin/node]
12: 0xd54cf6dbe3d
[1] 63298 abort firestore-migrator i /Users/mac-clement/Downloads/wetransfer-ff44eb/5000.json
如果您有任何建议,我将不胜感激。
似乎向您的 on data 事件处理程序添加 return null;
可以修复它。您的图书馆可能正在积累未解决的承诺。
您可能应该使用 SAX 策略并逐个读取文件。 DOM 策略意味着您将整个 JSON 文件解码为树结构。当你使用 SAX 策略时,你有一个事件来获取每个分离的值,它是用它做任何事情的关键。
我目前有一个 700M 的文件,当我尝试读取它时总是遇到内存限制(目的:使用 firestore nodejs sdk 将数据导入 FireStore)。
我尝试了以下库:
- json-流 (https://github.com/uhop/stream-json)
- JSONStream (https://github.com/dominictarr/JSONStream)
return fs.createReadStream(file)
.pipe(parser())
.pipe(streamArray())
.on('data', async (row) => {
// delete row.key;
if(row.value && typeof row.value === 'object') {
++totalSetCount;
}
})
.on('end', async () => {
// Final Batch commit and completion message.
// await batchCommit(false);
console.log(args.dryRun
? 'Dry-Run complete, Firestore was not updated.'
: 'Import success, Firestore updated!'
);
console.log(`Total documents written: ${totalSetCount}`);
});
}
这是我的错误:
<--- Last few GCs --->
[63298:0x102682000] 66318 ms: Mark-sweep 1365.8 (1441.3) -> 1353.1 (1441.8) MB, 470.6 / 0.0 ms (average mu = 0.212, current mu = 0.069) allocation failure scavenge might not succeed
[63298:0x102682000] 66796 ms: Mark-sweep 1366.4 (1442.3) -> 1352.1 (1443.3) MB, 446.4 / 0.0 ms (average mu = 0.152, current mu = 0.065) allocation failure scavenge might not succeed
<--- JS stacktrace --->
==== JS stack trace =========================================
0: ExitFrame [pc: 0xd54cf6dbe3d]
Security context: 0x364a2419e6e1 <JSObject>
1: exec [0x364a24189231](this=0x364a321029a1 <JSRegExp <String[50]: [^\"\]{1,256}|\[bfnrt\"\\/]|\u[\da-fA-F]{4}|\">>,0x364aa7402201 <Very long string[65536]>)
2: _processInput [0x364a32102a09] [/Users/mac-clement/Documents/projets/dpas/gcp/import-data/json-import/node_modules/stream-json/Parser.js:~107] [pc=0xd54cf9bb37b](this=0x364ac032ea19 <Tran...
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
1: 0x10003b125 node::Abort() [/usr/local/bin/node]
2: 0x10003b32f node::OnFatalError(char const*, char const*) [/usr/local/bin/node]
3: 0x1001a8e85 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
4: 0x1005742a2 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/usr/local/bin/node]
5: 0x100576d75 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [/usr/local/bin/node]
6: 0x100572c1f v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/usr/local/bin/node]
7: 0x100570df4 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
8: 0x10057d68c v8::internal::Heap::AllocateRawWithLigthRetry(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/usr/local/bin/node]
9: 0x10057d70f v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [/usr/local/bin/node]
10: 0x10054d054 v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) [/usr/local/bin/node]
11: 0x1007d4f24 v8::internal::Runtime_AllocateInNewSpace(int, v8::internal::Object**, v8::internal::Isolate*) [/usr/local/bin/node]
12: 0xd54cf6dbe3d
[1] 63298 abort firestore-migrator i /Users/mac-clement/Downloads/wetransfer-ff44eb/5000.json
如果您有任何建议,我将不胜感激。
似乎向您的 on data 事件处理程序添加 return null;
可以修复它。您的图书馆可能正在积累未解决的承诺。
您可能应该使用 SAX 策略并逐个读取文件。 DOM 策略意味着您将整个 JSON 文件解码为树结构。当你使用 SAX 策略时,你有一个事件来获取每个分离的值,它是用它做任何事情的关键。