使用 cli 根据文件中列出的名称从 s3 存储桶中删除文件

Question

我正在尝试从 Amazon S3 存储桶中删除多个（例如：数千个）文件。我在文件中列出了一个文件名，如下所示：

name1.jpg
name2.jpg
...
name2020201.jpg

我尝试了以下解决方案：

aws s3 rm s3://test-bucket --recursive --exclude "*" --include "data/*.*"

来自但 --include 只需要一个参数。我试图变得 hacky 并列出像 --include "name1.jpg" 这样的名称，但这也不起作用。

这种方法也不行：

aws s3 rm s3://test-bucket < file.txt

你能帮忙吗？

Answer 1

我用这个简单的 bash 脚本解决了这个问题：

#!/bin/bash  
set -e  
while read line  
do  
   aws s3 rm s3://test-bucket/$line
done <files.txt

灵感来自答案是：一次删除一个！

Answer 2

以下方法实际上要快得多，因为我的第一个答案花了很长时间才完成。

我的第一种方法是使用 rm 命令一次删除一行。这效率不高。大约 15 小时后 (!) 它只删除了大约 40.000 条记录，占总数的 1/5。

This approach by Norbert Preining is waaay faster. As he explains, it uses s3api method called delete-objects which can bulk delete objects in storage. This method takes a json object as an argument. To parse list of file names into JSON object required, this script uses JSON preprocessor called jq (read more here)。该脚本每次迭代需要 500 条记录。

cat file-with-names |  while mapfile -t -n 500 ary && ((${#ary[@]})); do
        objdef=$(printf '%s\n' "${ary[@]}" | ./jq-win64.exe -nR '{Objects: (reduce inputs as $line ([]; . + [{"Key":$line}]))}')
        aws s3api --no-cli-pager  delete-objects --bucket BUKET --delete "$objdef"
done

使用 cli 根据文件中列出的名称从 s3 存储桶中删除文件

Delete files from s3 bucket based on names listed in file using cli

command-line-interface

amazon-s3

amazon-web-services