从 tar 打包到 webhdfs 错误的 NodeJS 管道

NodeJS piping from tar packing to webhdfs error

我目前正在构建一个节点应用程序,当我的服务不是 运行 时,它使用 hadoop 作为数据的长期存储。由于预期会发生预期的传输量,并且首选处理时间最少,因此数据不会写入磁盘,而是直接通过管道传输到我打算使用它执行的操作。

我收到以下错误:

\nodejs_host\node_modules\webhdfs\lib\webhdfs.js:588
    src.unpipe(req);
        ^

TypeError: src.unpipe is not a function
    at Request.onPipe (\nodejs_host\node_modules\webhdfs\lib\webhdfs.js:588:9)
    at emitOne (events.js:101:20)
    at Request.emit (events.js:188:7)
    at Pack.Stream.pipe (stream.js:103:8)
    at Object.hadoop.putServer (\nodejs_host\hadoop.js:37:29)
    at Object.<anonymous> (\nodejs_host\hadoop.js:39:8)
    at Module._compile (module.js:541:32)
    at Object.Module._extensions..js (module.js:550:10)
    at Module.load (module.js:458:32)
    at tryModuleLoad (module.js:417:12)

我的代码基于以下文档:

https://github.com/npm/node-tar/blob/master/examples/packer.js https://github.com/harrisiirak/webhdfs/blob/master/README.md(写入远程文件)

这是我写的代码:

var webhdfs = require('webhdfs');
var fs = require('fs');
var tar = require('tar');
var fstream = require('fstream');

var hdfs = webhdfs.createClient({
    path: '/webhdfs/v1',
    // private
});

var hadoop = {}

hadoop.putServer = function(userid, svcid, serverDirectory, callback){  
    var readStream = fstream.Reader({path: serverDirectory, type: 'Directory'})
    var writeStream = hdfs.createWriteStream('/services/' + userid + '/' + svcid + '.tar')
    var packer = tar.Pack({noProprietary: true})

    packer.on('error', function(){console.error(err), callback(err, false)})
    readStream.on('error', function(){console.error(err), callback(err, false)})
    writeStream.on('error', function(){console.error(err), callback(err, false)})
    writeStream.on('finish', function(){callback(null, true)})

    readStream.pipe(packer).pipe(writeStream);
}
hadoop.putServer('1', '1', 'C:/test', function(){console.log('hadoop.putServer test done')});

文档表明这应该是有效的,有人愿意告诉我我哪里做错了吗?

看过 lib\webhdfs:588 here

req.on('pipe', function onPipe (src) {
// Pause read stream
stream = src;
stream.pause();

// This is not an elegant solution but here we go
// Basically we don't allow pipe() method to resume reading input
// and set internal _readableState.flowing to false
canResume = false;
stream.on('resume', function () {
  if (!canResume) {
    stream._readableState.flowing = false;
  }
});

// Unpipe initial request
src.unpipe(req); // <-- Line 588
req.end();
});

好的,所以我查看了这些模块的 github 页面上的问题,发现有人提到为 tar-fs 放弃了 tar 包。试一试并立即工作:)

因此,如果有人遇到与 webhdfs 和 tar 相关的问题,请查看 tar-fs https://github.com/mafintosh/tar-fs