如何从 Node.js 中的 S3 getObject 获得响应?

How to get response from S3 getObject in Node.js?

在一个 Node.js 项目中,我试图从 S3 取回数据。

当我使用 getSignedURL 时,一切正常:

aws.getSignedUrl('getObject', params, function(err, url){
    console.log(url); 
}); 

我的参数是:

var params = {
              Bucket: "test-aws-imagery", 
              Key: "TILES/Level4/A3_B3_C2/A5_B67_C59_Tiles.par"

如果我将 URL 输出带到控制台并将其粘贴到 Web 浏览器中,它会下载我需要的文件。

但是,如果我尝试使用 getObject,我会遇到各种奇怪的行为。我相信我只是错误地使用了它。这是我试过的:

aws.getObject(params, function(err, data){
    console.log(data); 
    console.log(err); 
}); 

输出:

{ 
  AcceptRanges: 'bytes',
  LastModified: 'Wed, 06 Apr 2016 20:04:02 GMT',
  ContentLength: '1602862',
  ETag: '9826l1e5725fbd52l88ge3f5v0c123a4"',
  ContentType: 'application/octet-stream',
  Metadata: {},
  Body: <Buffer 01 00 00 00  ... > }

  null

看来这工作正常。但是,当我在其中一个 console.log 上设置断点时,我的 IDE (NetBeans) 会抛出错误并拒绝显示数据值。虽然这可能只是 IDE,但我决定尝试其他方式来使用 getObject

aws.getObject(params).on('httpData', function(chunk){
    console.log(chunk); 
}).on('httpDone', function(data){
    console.log(data); 
});

这不会输出任何东西。放置断点表明代码永远不会到达 console.log 中的任何一个。我也试过:

aws.getObject(params).on('success', function(data){
    console.log(data); 
});

但是,这也不会输出任何内容,并且放置断点显示永远不会到达 console.log

我做错了什么?

乍一看你没有做错任何事,但你没有显示所有代码。当我第一次检查 S3 和节点时,以下对我有用:

var AWS = require('aws-sdk');

if (typeof process.env.API_KEY == 'undefined') {
    var config = require('./config.json');
    for (var key in config) {
        if (config.hasOwnProperty(key)) process.env[key] = config[key];
    }
}

var s3 = new AWS.S3({accessKeyId: process.env.AWS_ID, secretAccessKey:process.env.AWS_KEY});
var objectPath = process.env.AWS_S3_FOLDER +'/test.xml';
s3.putObject({
    Bucket: process.env.AWS_S3_BUCKET, 
    Key: objectPath,
    Body: "<rss><data>hello Fred</data></rss>",
    ACL:'public-read'
}, function(err, data){
    if (err) console.log(err, err.stack); // an error occurred
    else {
        console.log(data);           // successful response
        s3.getObject({
            Bucket: process.env.AWS_S3_BUCKET, 
            Key: objectPath
        }, function(err, data){
            console.log(data.Body.toString());
        });
    }
});

从 S3 API 执行 getObject() 时,根据 docs,您的文件内容位于 Body 属性,其中您可以从示例输出中看到。您应该拥有类似于以下内容的代码

const aws = require('aws-sdk');
const s3 = new aws.S3(); // Pass in opts to S3 if necessary

var getParams = {
    Bucket: 'abc', // your bucket name,
    Key: 'abc.txt' // path to the object you're looking for
}

s3.getObject(getParams, function(err, data) {
    // Handle any error and exit
    if (err)
        return err;

  // No error happened
  // Convert Body from a Buffer to a String
  let objectData = data.Body.toString('utf-8'); // Use the encoding necessary
});

您可能不需要从 data.Body 对象创建新缓冲区,但如果需要,您可以使用上面的示例来实现。

@aws-sdk/client-s3(2021 年更新)

自从我在 2016 年写下这个答案后,亚马逊发布了一个新的 JavaScript SDK,@aws-sdk/client-s3. This new version improves on the original getObject() by returning a promise always instead of opting in via .promise() being chained to getObject(). In addition to that, response.Body is no longer a Buffer 但是,Readable|ReadableStream|Blob 之一。这稍微改变了 response.Data 的处理方式。这应该更高效,因为我们可以流式传输返回的数据,而不是将所有内容保存在内存中,代价是实现起来有点冗长。

在下面的示例中,response.Body 数据将流式传输到数组中,然后作为字符串返回。这是我原始答案的等效示例。或者,response.Body 可以使用 stream.Readable.pipe() to an HTTP Response, a File or any other type of stream.Writeable 进一步使用,这将是获取大对象时性能更高的方法。

如果您想使用 Buffer,就像原来的 getObject() 响应一样,这可以通过将 responseDataChunks 包装在隐式调用的 Buffer.concat() instead of using Array#join(), this would be useful when interacting with binary data. To note, since Array#join() returns a string, each Buffer instance in responseDataChunks will have Buffer.toString() 中来完成将使用 utf8 的编码。

const { GetObjectCommand, S3Client } = require('@aws-sdk/client-s3')
const client = new S3Client() // Pass in opts to S3 if necessary

function getObject (Bucket, Key) {
  return new Promise(async (resolve, reject) => {
    const getObjectCommand = new GetObjectCommand({ Bucket, Key })

    try {
      const response = await client.send(getObjectCommand)
  
      // Store all of data chunks returned from the response data stream 
      // into an array then use Array#join() to use the returned contents as a String
      let responseDataChunks = []

      // Handle an error while streaming the response body
      response.Body.once('error', err => reject(err))
  
      // Attach a 'data' listener to add the chunks of data to our array
      // Each chunk is a Buffer instance
      response.Body.on('data', chunk => responseDataChunks.push(chunk))
  
      // Once the stream has no more data, join the chunks into a string and return the string
      response.Body.once('end', () => resolve(responseDataChunks.join('')))
    } catch (err) {
      // Handle the error or throw
      return reject(err)
    } 
  })
}

@aws-sdk/client-s3 文档链接

或者你可以使用 minio-js client library get-object.js

var Minio = require('minio')

var s3Client = new Minio({
  endPoint: 's3.amazonaws.com',
  accessKey: 'YOUR-ACCESSKEYID',
  secretKey: 'YOUR-SECRETACCESSKEY'
})

var size = 0
// Get a full object.
s3Client.getObject('my-bucketname', 'my-objectname', function(e, dataStream) {
  if (e) {
    return console.log(e)
  }
  dataStream.on('data', function(chunk) {
    size += chunk.length
  })
  dataStream.on('end', function() {
    console.log("End. Total size = " + size)
  })
  dataStream.on('error', function(e) {
    console.log(e)
  })
})

免责声明:我为 Minio Its open source, S3 compatible object storage written in golang with client libraries available in Java, Python, Js, golang 工作。

基于@peteb 的回答,但使用 PromisesAsync/Await

const AWS = require('aws-sdk');

const s3 = new AWS.S3();

async function getObject (bucket, objectKey) {
  try {
    const params = {
      Bucket: bucket,
      Key: objectKey 
    }

    const data = await s3.getObject(params).promise();

    return data.Body.toString('utf-8');
  } catch (e) {
    throw new Error(`Could not retrieve file from S3: ${e.message}`)
  }
}

// To retrieve you need to use `await getObject()` or `getObject().then()`
const myObject = await getObject('my-bucket', 'path/to/the/object.txt');

对于正在寻找上述 NEST JS TYPESCRIPT 版本的人:

    /**
     * to fetch a signed URL of a file
     * @param key key of the file to be fetched
     * @param bucket name of the bucket containing the file
     */
    public getFileUrl(key: string, bucket?: string): Promise<string> {
        var scopeBucket: string = bucket ? bucket : this.defaultBucket;
        var params: any = {
            Bucket: scopeBucket,
            Key: key,
            Expires: signatureTimeout  // const value: 30
        };
        return this.account.getSignedUrlPromise(getSignedUrlObject, params);
    }

    /**
     * to get the downloadable file buffer of the file
     * @param key key of the file to be fetched
     * @param bucket name of the bucket containing the file
     */
    public async getFileBuffer(key: string, bucket?: string): Promise<Buffer> {
        var scopeBucket: string = bucket ? bucket : this.defaultBucket;
        var params: GetObjectRequest = {
            Bucket: scopeBucket,
            Key: key
        };
        var fileObject: GetObjectOutput = await this.account.getObject(params).promise();
        return Buffer.from(fileObject.Body.toString());
    }

    /**
     * to upload a file stream onto AWS S3
     * @param stream file buffer to be uploaded
     * @param key key of the file to be uploaded
     * @param bucket name of the bucket 
     */
    public async saveFile(file: Buffer, key: string, bucket?: string): Promise<any> {
        var scopeBucket: string = bucket ? bucket : this.defaultBucket;
        var params: any = {
            Body: file,
            Bucket: scopeBucket,
            Key: key,
            ACL: 'private'
        };
        var uploaded: any = await this.account.upload(params).promise();
        if (uploaded && uploaded.Location && uploaded.Bucket === scopeBucket && uploaded.Key === key)
            return uploaded;
        else {
            throw new HttpException("Error occurred while uploading a file stream", HttpStatus.BAD_REQUEST);
        }
    }

这是 async / await 版本

var getObjectAsync = async function(bucket,key) {
  try {
    const data = await s3
      .getObject({ Bucket: bucket, Key: key })
      .promise();
      var contents = data.Body.toString('utf-8');
      return contents;
  } catch (err) {
    console.log(err);
  }
}
var getObject = async function(bucket,key) {
    const contents = await getObjectAsync(bucket,key);
    console.log(contents.length);
    return contents;
}
getObject(bucket,key);

与上述@ArianAcosta 的回答极为相似。除了我正在使用 import(对于节点 12.x 及更高版本),添加 AWS 配置并嗅探图像负载并对 return.[=14 应用 base64 处理=]

// using v2.x of aws-sdk
import aws from 'aws-sdk'

aws.config.update({
  accessKeyId: process.env.YOUR_AWS_ACCESS_KEY_ID,
  secretAccessKey: process.env.YOUR_AWS_SECRET_ACCESS_KEY,
  region: "us-east-1" // or whatever
})

const s3 = new aws.S3();

/**
 * getS3Object()
 * 
 * @param { string } bucket - the name of your bucket
 * @param { string } objectKey - object you are trying to retrieve
 * @returns { string } - data, formatted
 */
export async function getS3Object (bucket, objectKey) {
  try {
    const params = {
      Bucket: bucket,
      Key: objectKey 
    }

    const data = await s3.getObject(params).promise();

    // Check for image payload and formats appropriately
    if( data.ContentType === 'image/jpeg' ) {
      return data.Body.toString('base64');
    } else {
      return data.Body.toString('utf-8');
    }

  } catch (e) {
    throw new Error(`Could not retrieve file from S3: ${e.message}`)
  }
}

使用 node-fetch

GetObjectOutput.Body 转换为 Promise<string>

在 aws-sdk-js-v3 @aws-sdk/client-s3, GetObjectOutput.Body is a subclass of Readable in nodejs (specifically an instance of http.IncomingMessage) instead of a Buffer as it was in aws-sdk v2, so resp.Body.toString('utf-8') will give you the wrong result “[object Object]”. Instead, the easiest way to turn GetObjectOutput.Body into a Promise<string> is to construct a node-fetch Response, which takes a Readable subclass (or Buffer instance, or other types from the fetch spec) 并且有转换方法 .json(), .text(), .arrayBuffer(), and .blob().

这也适用于 aws-sdk 和平台的其他变体(@aws-sdk v3 节点 Buffer、v3 浏览器 Uint8Array 子类、v2 节点 Readable、v2浏览器 ReadableStreamBlob)

npm install node-fetch
import { Response } from 'node-fetch';
import * as s3 from '@aws-sdk/client-s3';

const client = new s3.S3Client({})
const s3Response = await client.send(new s3.GetObjectCommand({Bucket: '…', Key: '…'});
const response = new Response(s3Response.Body);

const obj = await response.json();
// or
const text = await response.text();
// or
const buffer = Buffer.from(await response.arrayBuffer());
// or
const blob = await response.blob();

参考:GetObjectOutput.Body documentation, node-fetch Response documentation, node-fetch Body constructor source, minipass-fetch Body constructor source

感谢kennu comment in GetObjectCommand usability issue

已更新 (2022)

nodejs v17.5.0 added Readable.toArray。如果此 API 在您的节点版本中可用。代码会很短:

const buffer = Buffer.concat(
    await (
        await s3Client
            .send(new GetObjectCommand({
                Key: '<key>',
                Bucket: '<bucket>',
            }))
    ).Body.toArray()
)

如果您使用的是 Typescript,您可以安全地将 .Body 部分转换为 Readable(其他类型 ReadableStreamBlob 仅在浏览器环境中返回.另外,在浏览器中,不支持Blob is only used in legacy fetch API when response.body)

(response.Body as Readable).toArray()

请注意:Readable.toArray 是一项实验性(但很方便)的功能,请谨慎使用。

=============

原回答

如果您使用的是 aws sdk v3,则 sdk v3 returns nodejs Readable (precisely, IncomingMessage 扩展了 Readable) 而不是 Buffer。

这是打字稿版本。请注意,这仅适用于节点,如果您从浏览器发送请求,请查看下面提到的博客 post 中较长的答案。

import {GetObjectCommand, S3Client} from '@aws-sdk/client-s3'
import type {Readable} from 'stream'

const s3Client = new S3Client({
    apiVersion: '2006-03-01',
    region: 'us-west-2',
    credentials: {
        accessKeyId: '<access key>',
        secretAccessKey: '<access secret>',
    }
})
const response = await s3Client
    .send(new GetObjectCommand({
        Key: '<key>',
        Bucket: '<bucket>',
    }))
const stream = response.Body as Readable

return new Promise<Buffer>((resolve, reject) => {
    const chunks: Buffer[] = []
    stream.on('data', chunk => chunks.push(chunk))
    stream.once('end', () => resolve(Buffer.concat(chunks)))
    stream.once('error', reject)
})
// if readable.toArray() is support
// return Buffer.concat(await stream.toArray())

为什么要投response.Body as Readable?答案太长。有兴趣的读者可以在my blog post.

上找到更多信息

Body.toString() 方法不再适用于最新版本的 s3 api。请改用以下内容:

const { S3Client, GetObjectCommand } = require("@aws-sdk/client-s3");

const streamToString = (stream) =>
    new Promise((resolve, reject) => {
      const chunks = [];
      stream.on("data", (chunk) => chunks.push(chunk));
      stream.on("error", reject);
      stream.on("end", () => resolve(Buffer.concat(chunks).toString("utf8")));
    });

(async () => {
  const region = "us-west-2";
  const client = new S3Client({ region });

  const command = new GetObjectCommand({
    Bucket: "test-aws-sdk-js-1877",
    Key: "readme.txt",
  });

  const { Body } = await client.send(command);
  const bodyContents = await streamToString(Body);
  console.log(bodyContents);
})();

从此处复制并粘贴:https://github.com/aws/aws-sdk-js-v3/issues/1877#issuecomment-755387549

不确定为什么还没有添加此解决方案,因为我认为它比最佳答案更清晰。