使用 node.js 和 return 记录处理来自 aws s3 getobject 的数百万条记录到前端分页的最佳方法

best approach to handle million of records from aws s3 getobject with node.js and return records to frontend with pagination

下面的代码从 aws s3 获取 csv 文件数据,获取数据后我需要操作响应和 return 从 node.js 后端到 frontend.But 的相同数据问题是数据超过 200k 的记录是不可行的节点将其保存在内存中并且 return 与前端相同。

  AWS.config.update({
    accessKeyId: "xxxxxxxxxxxxxxxx",
    secretAccessKey: "xxxxxxxxxxxxxxxxxxxxxxxx",
    "region": "--------"  
})

  const s3 = new AWS.S3();
  const params = {
    Bucket: 'bucket',
    Key: 'userFIle/test.csv',
    Range:"bytes=7777-9999"
  }
  const datae = await s3.getObject(params).promise();
  let str=datae.Body.toString()
  let workBook ,jsonData

  workBook = xlsx.read(str, { type: 'binary' });
  jsonData = workBook.SheetNames.reduce((initial, name) => {
    const sheet = workBook.Sheets[name];
    initial[name] = xlsx.utils.sheet_to_json(sheet);
    return initial;
  }, {});
  console.log(jsonData,"==fffffff==",jsonData.Sheet1.length)

AWS S3 SDK 可以与流媒体一起使用,因为 CSV 是一种很好的流媒体格式,它可以在下载时进行解析和转换。

我建议(作为作者)使用 scramjet,并且 您的代码可能如下所示:

const {StringStream} = require("scramjet");

StringStream
  // Scramjet stream can be created from any stream source or a generator function
  .from(() => s3.getObjectMetadata(key)
    .promise()
    .then(() => s3.getObject().createReadStream())
  )
  // then you can run a flow of commands here
  // First we'd need to parse the items as they are being downloaded from s3.
  // if you have a header in the first line, you can pass {header: true} here,
  // see: https://www.papaparse.com/docs#config for more options
  .CSVParse() 
  // here you can parse the object to your own structure
  .map(row => {
    return {
      id: row[0],
      price: row[1],
      name: row[2]
    }
  })
  // you can also use async functions or promises for every line.
  .each(async function(item) { await doSomethingWithItem(); }
  // this will print every line off your CSV while it's being downloaded
  .each(console.log) 
  .run()
  .catch(error => {
    if (error.statusCode === 404) {
      // Catching NoSuchKey
    }
  });

请在此处查看文档:www.scramjet.org