在 node.js 上用低 RAM 对大文件进行排序

Question

我们有 500GB 的整数行文件。我们如何使用 Node.js 仅使用 512Mb RAM 对其进行排序？我觉得是这样的：

将主文件分成 256Mb 块
对每个块进行排序
获取每个块的第一行，排序并将其推送到最终文件
对块中的每一行执行第 3 步。

一些想法？

更新： 感谢用户 some-random-it-boy 该解决方案基于带有本机排序实用程序的子进程。我认为它应该有效）

var fs = require('fs'),
    spawn = require('child_process').spawn,
    sort = spawn('sort', ['in.txt']);

var writer = fs.createWriteStream('out.txt');

sort.stdout.on('data', function (data) {
  writer.write(data)
});

sort.on('exit', function (code) {
  if (code) console.log(code); //if some error
  writer.end();
});

Answer 1

如果您的整数不是太大，您也可以尝试将它们转换为字符串，将它们与自定义比较器进行比较，然后再转换回整数。
背景: JS 需要 64 位的数字（例如整数），并为字符串使用一组 16 位无符号整数值的“元素”。 (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Data_structures)

我不知道这是否真的有助于记忆，是否值得付出努力，但也许会有帮助。

Answer 2

我讨厌给 non-js 一个 js 问题的解决方案。但是，既然您使用的是节点环境，为什么不将此任务委托给专门为此设计的流程呢？

使用你的包 child-process, call the sort (docs here) 命令和你需要的任何参数。

引用自this answer：

According to the algorithm used by sort, it will use memory according to what is available: half of the biggest number between TotalMem/8 and AvailableMem. So, for example, if you have 4 GB of available mem (out of 8 GB), sort will use 2GB of RAM. It should also create many 2 GB files in /bigdisk and finally merge-sort them.

这基本上就是您建议的做法，已经在裸硬件上以 C 运行实现，中间没有任何解释器。我想在你的限制范围内你不能比那个更快:)

在 node.js 上用低 RAM 对大文件进行排序

Sort huge file with low RAM on node.js

javascript

sorting

fs

node.js