如何使用 PHP 和 aws-sdk v3 将大型档案上传到 Amazon Glacier?

How to upload large archives to Amazon Glacier using PHP and aws-sdk v3?

这是我第一次使用亚马逊的任何东西。我正在尝试使用 PHP SDK V3 将多个文件上传到 Amazon Glacier。这些文件随后需要由亚马逊合并为一个文件。

文件存储在 cPanel 的主目录中,必须通过 cron 作业上传到 Amazon Glacier。

我知道我必须使用多部分上传方法,但我不确定它还需要哪些其他功能才能使其正常工作。我也不确定我计算和传递变量的方式是否正确。

这是我目前得到的代码:

<?php
require 'aws-autoloader.php';

use Aws\Glacier\GlacierClient;
use Aws\Glacier\TreeHash;

//############################################
//DEFAULT VARIABLES
//############################################
$key = 'XXXXXXXXXXXXXXXXXXXX';
$secret = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';   
$accountId = '123456789123';
$vaultName = 'VaultName';
$partSize = '4194304';
$fileLocation = 'path/to/files/';

//############################################
//DECLARE THE AMAZON CLIENT
//############################################
$client = new GlacierClient([
    'region' => 'us-west-2',
    'version' => '2012-06-01',
    'credentials' => array(
        'key'    => $key,
        'secret' => $secret,
  )
]);

//############################################
//GET THE UPLOAD ID
//############################################
$result = $client->initiateMultipartUpload([
    'partSize' => $partSize,
    'vaultName' => $vaultName
]);
$uploadId = $result['uploadId'];

//############################################
//GET ALL FILES INTO AN ARRAY
//############################################
$files = scandir($fileLocation);
unset($files[0]);
unset($files[1]);
sort($files);

//############################################
//GET SHA256 TREE HASH (CHECKSUM)
//############################################
$th = new TreeHash();
//GET TOTAL FILE SIZE
foreach($files as $part){
    $filesize = filesize($fileLocation.$part);
    $total = $filesize;
    $th = $th->update(file_get_contents($fileLocation.$part));
}
$totalchecksum = $th->complete();

//############################################
//UPLOAD FILES
//############################################
foreach ($files as $key => $part) {
    //HASH CONTENT
    $filesize = filesize($fileLocation.$part);
    $rangeSize = $filesize-1;
    $range = 'bytes 0-'.$rangeSize.'/*';
    $sourcefile = $fileLocation.$part;

    $result = $client->uploadMultipartPart([
        'accountId' => $accountId,
        'checksum' => '',
        'range' => $range,
        'sourceFile' => $sourcefile,
        'uploadId' => $uploadId,
        'vaultName' => $vaultName
    ]);
}

//############################################
//COMPLETE MULTIPART UPLOAD
//############################################
$result = $client->completeMultipartUpload([
    'accountId' => $accountId,
    'archiveSize' => $total,
    'checksum' => $totalchecksum,
    'uploadId' => $uploadId,
    'vaultName' => $vaultName,
]);
?>

似乎新 Glacier 客户端的声明正在运行,我确实收到了一个 UploadID,但如果我做对了,我还不是 100%。文件需要上传到然后合并的 Amazon Glacier Vault 仍然是空的,我不确定文件是否只会显示已成功执行 completeMultipartUpload 的文件。

我在 运行 代码时也收到以下错误:

Fatal error: Uncaught exception 'Aws\Glacier\Exception\GlacierException' with message 'Error executing "CompleteMultipartUpload" on "https://glacier.us-west-2.amazonaws.com/XXXXXXXXXXXX/vaults/XXXXXXXXXX/multipart-uploads/cTI0Yfk6xBYIQ0V-rhq6AcdHqd3iivRJfyYzK6-NV1yn9GQvJyYCoSrXrrrx4kfyGm6m9PUEAq4M0x6duXm5MD8abn-M"; AWS HTTP error: Client error: 403 InvalidSignatureException (client): The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details. The Canonical String for this request should have been 'POST /XXXXXXXXXXX/vaults/XXXXXXXXX/multipart-uploads/cTI0Yfk6xBYIQ0V-rhq6AcdHqd3iivRJfyYzK6-NV1yn9GQvJyYCoSrXrrrx4kfyGm6m9PUEAq4M0x6duXm5MD8abn-M host:glacier.us-west-2.amazonaws.com x-amz-archive-size:1501297 x-amz-date:20151016T081455Z x-amz-glacier-version:2012-06-01 x-amz-sha256-tree-hash:?[ qiuã°²åÁ¹ý+¤Üª¤ [;K×T host;x-amz-archive-size;x-amz-date;x-amz-glacier-version;x-am in /home/XXXXXXXXXXXX/public_html/XXXXXXXXXXX/Aws/WrappedHttpHandler.php on line 152

是否有更简单的方法来做到这一点?如果有帮助的话,我也有完整的 SSH 访问权限。

我想你误解了uploadMultipartPart。 uploadMultipartPart 意味着,你上传 1 个大文件,分成多个部分。 然后做一个completeMultipartUpload来标记你已经完成了一个文件的上传。

从您的代码看来,您正在上传多个文件。

您可能实际上不需要使用 uploadMultipartPart

也许您可以使用常规 "uploadArchive"?

参考:

https://blogs.aws.amazon.com/php/post/Tx7PFHT4OJRJ42/Uploading-Archives-to-Amazon-Glacier-from-PHP

我已经在 PHP SDK V3(版本 3)中解决了这个问题,并且我在研究中不断发现这个问题,所以我想我也会 post 我的解决方案。使用风险自负,几乎没有错误检查或处理。

<?php
require 'vendor/autoload.php';

use Aws\Glacier\GlacierClient;
use Aws\Glacier\TreeHash;


// Create the glacier client to connect with
$glacier = new GlacierClient(array(
      'profile' => 'default',
      'region' => 'us-east-1',
      'version' => '2012-06-01'
      ));

$fileName = '17mb_test_file';         // this is the file to upload
$chunkSize = 1024 * 1024 * pow(2,2);  // 1 MB times a power of 2
$fileSize = filesize($fileName);      // we will need the file size (in bytes)

// initiate the multipart upload
// it is dangerous to send the filename without escaping it first
$result = $glacier->initiateMultipartUpload(array(
      'archiveDescription' => 'A multipart-upload for file: '.$fileName,
      'partSize' => $chunkSize,
      'vaultName' => 'MyVault'
      ));

// we need the upload ID when uploading the parts
$uploadId = $result['uploadId'];

// we need to generate the SHA256 tree hash
// open the file so we can get a hash from its contents
$fp = fopen($fileName, 'r');
// This class can generate the hash
$th = new TreeHash();
// feed in all of the data
$th->update(fread($fp, $fileSize));
// generate the hash (this comes out as binary data)...
$hash = $th->complete();
// but the API needs hex (thanks). PHP to the rescue!
$hash = bin2hex($hash);

// reset the file position indicator
fseek($fp, 0);

// the part counter
$partNumber = 0;

print("Uploading: '".$fileName
    ."' (".$fileSize." bytes) in "
    .(ceil($fileSize/$chunkSize))." parts...\n");
while ($partNumber * $chunkSize < ($fileSize + 1))
{
  // while we haven't written everything out yet
  // figure out the offset for the first and last byte of this chunk
  $firstByte = $partNumber * $chunkSize;
  // the last byte for this piece is either the last byte in this chunk, or
  // the end of the file, whichever is less
  // (watch for those Obi-Wan errors)
  $lastByte = min((($partNumber + 1) * $chunkSize) - 1, $fileSize - 1);

  // upload the next piece
  $result = $glacier->uploadMultipartPart(array(
        'body' => fread($fp, $chunkSize),  // read the next chunk
        'uploadId' => $uploadId,          // the multipart upload this is for
        'vaultName' => 'MyVault',
        'range' => 'bytes '.$firstByte.'-'.$lastByte.'/*' // weird string
        ));

  // this is where one would check the results for error.
  // This is left as an exercise for the reader ;)

  // onto the next piece
  $partNumber++;
  print("\tpart ".$partNumber." uploaded...\n");
}
print("...done\n");

// and now we can close off this upload
$result = $glacier->completeMultipartUpload(array(
  'archiveSize' => $fileSize,         // the total file size
  'uploadId' => $uploadId,            // the upload id
  'vaultName' => 'MyVault',
  'checksum' => $hash                 // here is where we need the tree hash
));

// this is where one would check the results for error.
// This is left as an exercise for the reader ;)


// get the archive id.
// You will need this to refer to this upload in the future.
$archiveId = $result->get('archiveId');

print("The archive Id is: ".$archiveId."\n");


?>

注意:使用aws-sdk-php v2上传多部分的解决方案。我认为它可以在 v3 上使用 class TreeHash.

多亏了 ,我完成了同样的任务,但有所改进。

Neil 只对整个文件进行校验和验证。它有两个可能的问题:

  • 可能会占用内存:记住我们正在上传一个大文件;对其进行哈希处理以获得校验和,需要打开它并读取其所有内容。
  • 我们正在上传多个文件部分:我们在上传某些部分时可能会遇到问题,最终导致 aws 上的文件部分损坏。如果我们计算并验证每个部分的每个校验和,我们就可以防止出现问题。

在下面的代码中,我们计算发送到 aws 的每个文件部分的校验和,我们将每个文件部分连同相关的文件部分发送到 aws api。

一旦 aws 完成接收上传的部分,它就会对其执行校验和。如果校验和与我们的不匹配,它会抛出异常。如果成功,我们确定该部分已成功上传。

<?php
use Aws\Common\Hash\TreeHash;
use Aws\Glacier\GlacierClient;

/**
 * upload a file and store it into aws glacier
 */
class UploadMultipartFileToGlacier
{
    // aws glacier
    private $description;
    private $glacierClient;
    private $glacierConfig;
    /*
     * it's a requirement the part size beingto be (1024 KB * 1024 KB) multiplied by any power of 2 (1MB, 2MB, 4MB, 8MB, and so on)
     * reference: https://docs.aws.amazon.com/aws-sdk-php/v2/api/class-Aws.Glacier.GlacierClient.html#_initiateMultipartUpload
     **/
    private $partSize;

    // file location
    private $filePath;

    private $errorMessage;
    private $executionDate;

    public function __construct($filePath)
    {
        $this->executionDate = date('Y-m-d H:i:s');
        $this->filePath = $filePath;
    
        // AWS Glacier
        $this->glacierConfig = (object) [
            'vaultId' => 'VAULT_NAME',
            'region' => 'REGION',
            'accessKeyId' => 'ACCESS_KEY',
            'secretAccessKey' => 'SECRET_KEY',
        ];

        $this->glacierClient = GlacierClient::factory(array(
            'credentials' => array(
                'key'    => $this->glacierConfig->accessKeyId,
                'secret' => $this->glacierConfig->secretAccessKey,
            ),
            'region' => $this->glacierConfig->region
        ));

        $this->description = sprintf('Upload file %s at %s', $this->filePath, $this->executionDate);

        $this->partSize = 1024 * 1024 * pow(2, 2); // 4 MB
    }

    public function upload()
    {
        list($success, $data) = $this->uploadFileToGlacier();

        if ($success) {
            // todo: tasks to do when file has upload successfuly to aws glacier
        } else {
            // todo: handle error
            // $this->errorMessage contains the exception message
        }
    }

    private function completeMultipartUpload($uploadId, $fileSize, $checksumParts)
    {
        // with all the chechsums of the processed file parts, we can compute the file checksum. It's important to send it as a parameter to the
        // aws api's GlacierClient::completeMultipartUpload. Aws compute on their side the checksum of the uploaded part. If
        // their checksum doesn't match ours, the api throws an exception.
        $checksum = $this->getChecksumFile($checksumParts);

        return $this->glacierClient->completeMultipartUpload([
            'archiveSize' => $fileSize,
            'uploadId' => $uploadId,
            'vaultName' => $this->glacierConfig->vaultId,
            'checksum' => $checksum
        ]);
    }

    private function getChecksumPart($content)
    {
        $treeHash = new TreeHash();
        $mb = 1024 * 1024 * pow(2, 0); // 1 MB (the class TreeHash only allows to process chunks <= 1 MB)
        $buffer = $content;

        while (strlen($buffer) >= $mb) {
            $data = substr($buffer, 0, $mb);
            $buffer = substr($buffer, $mb) ?: '';
            $treeHash->addData($data);
        }
        
        if (strlen($buffer)) {
            $treeHash->addData($buffer);
        }

        return $treeHash->getHash();
    }

    private function getChecksumFile($checksumParts)
    {
        $treeHash = TreeHash::fromChecksums($checksumParts);

        return $treeHash->getHash();
    }

    private function initiateMultipartUpload()
    {
        $result = $this->glacierClient->initiateMultipartUpload([
            'accountId' => '-',
            'vaultName' => $this->glacierConfig->vaultId,
            'archiveDescription' => $this->description,
            'partSize' => $this->partSize,
        ]);

        return $result->get('uploadId');
    }

    private function uploadFileToGlacier()
    {
        $success = true;
        $data = false;

        try {
            $fileSize = filesize($this->filePath);

            $uploadId = $this->initiateMultipartUpload();
            $checksums = $this->uploadMultipartFile($uploadId, $fileSize);
            $model = $this->completeMultipartUpload($uploadId, $fileSize, $checksums);

            $data = (object) [
                'archiveId' => $model->get('archiveId'),
                'executionDate' => $this->executionDate,
                'location' => $model->get('location'),
            ];
        } catch (\Exception $e) {
            $this->errorMessage = $e->getMessage();
            $success = false;
        }

        return [$success, $data];
    }
    
    private function uploadMultipartFile($uploadId, $fileSize)
    {
        $numParts = ceil($fileSize / $this->partSize);
        $fp = fopen($this->filePath, 'r');
        $partIdx = 0;
        $checksumParts = [];

        error_log("Uploading: {$this->filePath} ({$fileSize} bytes) in {$numParts} parts...");

        while ($partIdx * $this->partSize < ($fileSize + 1)) {
            $firstByte = $partIdx * $this->partSize;
            $lastByte = min((($partIdx + 1) * $this->partSize) - 1, $fileSize - 1);
            $content = fread($fp, $this->partSize);
            
            // we compute the checksum of the part we're processing. It's important to send it as a parameter to the
            // aws api's GlacierClient::uploadMultipartPart. Aws compute on their side the checksum of the uploaded part. If
            // their checksum doesn't match ours, the api throws an exception.
            $checksumPart = $this->getChecksumPart($content);

            $result = $this->glacierClient->uploadMultipartPart([
                'body' => $content,
                'uploadId' => $uploadId,
                'vaultName' => $this->glacierConfig->vaultId,
                'checksum' => $checksumPart,
                'range' => "bytes {$firstByte}-{$lastByte}/*"
            ]);

            $checksumParts[] = $result->get('checksum'); // same result as $checksumPart. It throws an exception if doesn't
            
            $partIdx++;
            error_log("Part {$partIdx} uploaded...");
        }

        return $checksumParts;
    }
}

$uploadMultipartFileToGlacier = new UploadMultipartFileToGlacier('<FILE_PATH>');

$uploadMultipartFileToGlacier->upload();