Python + Wand.Image - 使用顺序 pagenumber.jpg 名称将输出图像保存到 AWS

Python + Wand.Image - saving output images to AWS with sequential pagenumber.jpg names

我正在编写一个脚本,可以将来自互联网的 PDF(不保存到磁盘)转换为一系列 jpeg,然后将 JPG 保存到 AWS s3。

不幸的是,下面的代码仅将 PDF 的第一页以 JPG 格式保存到 AWS。关于如何修改它以使用顺序文件名将图像保存到 AWS 的任何想法?

from urllib2 import urlopen
from wand.image import Image
from io import BytesIO
import boto3
    s3 = boto3.client(
        's3',
        aws_access_key_id='mykey',
        aws_secret_access_key='mykey'
    )

    bucket_name = 'testbucketAWS323'
    #location on disk

    #file prefix
test_id = 'example'
f = urlopen("https://s3.us-east-2.amazonaws.com/converted1jpgs/example.pdf")
bytes_io_file = BytesIO()
with Image(file=f) as img:
    print('pages = ', len(img.sequence))
    with img.convert('png') as converted:
        bytes_io_file = BytesIO(converted.make_blob('jpeg'))
      #code below should take 'converted' object, and save it to AWS as jpg. 
        s3.upload_fileobj(bytes_io_file, bucket_name, "assssd.jpg")
        print 'done'

转换后使用upload_fileobj方法怎么样?

只需枚举文档页面 (wand.image.Image.sequence) 即可获取页码和资源。将页面资源复制到Image的新实例,直接导出blob,不用担心中间转换。

from urllib2 import urlopen
from wand.image import Image
from io import BytesIO
import boto3

# ....

url = 'https://s3.us-east-2.amazonaws.com/converted1jpgs/example.pdf'
resource = urlopen(url)
with Image(file=resource) as document:
    for page_number, page in enumerate(document.sequence):
        with Image(page) as img:
            bytes_io_file = BytesIO(img.make_blob('JPEG'))
            filename = 'output_{0}.jpg'.format(page_number)
            s3.upload_fileobj(bytes_io_file, bucket_name, filename)