通过 uber/petastorm 将 ndarrays 存储到 Parquet 中？

Storing ndarrays into Parquet via uber/petastorm?

python
arrays
matrix
parquet
petastorm

是否可以通过uber/petastorm将N维数组存储到Parquet中？

是的。 Petastorm 在标准 Apache Parquet 格式之上提供自定义编解码器层和模式扩展。 n 维数组/张量将被序列化为二进制 blob 字段。从用户的角度来看，这些看起来像是原生类型，具体取决于您使用的环境（纯 Python/pyspark：Tensorflow 中的 numpy/array、tf.Tensor 或 PyTorch 中的 torch Tensors）。

这里有一些简单易懂的示例：https://github.com/uber/petastorm/tree/master/examples/hello_world/petastorm_dataset

通过 uber/petastorm 将 ndarrays 存储到 Parquet 中？

Storing ndarrays into Parquet via uber/petastorm?

python

arrays

matrix

parquet

petastorm