以羽化格式将数据帧保存到 S3
Save a data frame to S3 in feather format
我有一个数据框,比方说:
import pandas as pd
df = pd.DataFrame({'a': [1, 4], 'b': [1, 3]})
我想将它作为 feather 文件保存到 s3,但找不到可行的方法。
有什么建议吗?
您可以使用 storefact / simplekv,无需写入磁盘。
import pyarrow as pa
from pyarrow.feather import write_feather
import storefact
df = …
store = storefact.get_store('hs3', host="…", bucket="…", access_key="…", secret_key="…")
buf = pa.BufferOutputStream()
write_feather(df, buf)
storage.put('filename.feather', buf.get_result().to_pybytes())
对我有用的解决方案是
import boto3
import pandas as pd
from io import BytesIO
from pyarrow.feather import write_feather
df = pd.DataFrame({'a': [1, 4], 'b': [1, 3]})
s3_resource = boto3.resource('s3')
with BytesIO() as f:
write_feather(df, f)
s3_resource.Object('bucket-name', 'file_name').put(Body=f.getvalue())
仅使用 Pyarrow 和 Pandas
的简单解决方案
import pandas as pd
import pyarrow as pa
s3 = pa.fs.S3FileSystem(region='us-east-1')
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
with s3.open_output_stream('my-bucket/path/to.feather') as f:
pa.feather.write_feather(df, f)
我有一个数据框,比方说:
import pandas as pd
df = pd.DataFrame({'a': [1, 4], 'b': [1, 3]})
我想将它作为 feather 文件保存到 s3,但找不到可行的方法。
有什么建议吗?
您可以使用 storefact / simplekv,无需写入磁盘。
import pyarrow as pa
from pyarrow.feather import write_feather
import storefact
df = …
store = storefact.get_store('hs3', host="…", bucket="…", access_key="…", secret_key="…")
buf = pa.BufferOutputStream()
write_feather(df, buf)
storage.put('filename.feather', buf.get_result().to_pybytes())
对我有用的解决方案是
import boto3
import pandas as pd
from io import BytesIO
from pyarrow.feather import write_feather
df = pd.DataFrame({'a': [1, 4], 'b': [1, 3]})
s3_resource = boto3.resource('s3')
with BytesIO() as f:
write_feather(df, f)
s3_resource.Object('bucket-name', 'file_name').put(Body=f.getvalue())
仅使用 Pyarrow 和 Pandas
的简单解决方案import pandas as pd
import pyarrow as pa
s3 = pa.fs.S3FileSystem(region='us-east-1')
df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
with s3.open_output_stream('my-bucket/path/to.feather') as f:
pa.feather.write_feather(df, f)