python 内存中的gzip文件并上传到s3

Question

我正在使用 python 2.7...

我正在尝试收集两个日志文件，使用 sed 从特定日期获取数据。需要压缩文件并将它们上传到s3而不在系统上制作任何临时文件，

sed_command = "sed -n '/{}/,/{}/p'".format(last_date, last_date)

流量：

cat 两个文件.

示例：猫文件 1 文件 2

运行内存中的 sed 操作。
使用 zip 或 gzip 将结果压缩到内存中。
将内存中的压缩文件上传到s3。

我已经通过在系统上创建临时文件并在上传到 s3 完成后将其删除来成功完成此操作。在不创建任何临时文件的情况下，我找不到一个可行的解决方案来使它在运行中运行。

Answer 1

要点如下：

conn = boto.s3.connection.S3Connection(aws_key, secret_key)
bucket = conn.get_bucket(bucket_name, validate=True)
buffer = cStringIO.StringIO()
writer = gzip.GzipFile(None, 'wb', 6, buffer)
writer.write(sys.stdin.read())
writer.close()
buffer.seek(0)
boto.s3.key.Key(bucket, key_path).set_contents_from_file(buffer)
buffer.close()

Answer 2

有点迟到的答案，但我最近发布了一个 package 就是这样做的，它可以通过 pypi 安装：

    pip install aws-logging-handlers

您可以在 git

上找到使用文档

python 内存中的gzip文件并上传到s3

python gzip file in memory and upload to s3

python

gzip

sed

boto

stringio