fastparquet 中的压缩选项不一致
compression option in fastparquet is not consistent
根据project page of fastparquet,fastparquet
支持各种压缩方式
Optional (compression algorithms; gzip is always available):
snappy (aka python-snappy)
lzo
brotli
lz4
zstandard
尤其是 zstandard
是一种现代算法,可提供高压缩比以及令人印象深刻的快速 compression/decompression 速度。这就是我想要的 fastparquet。
的文档中
compression to apply to each column, e.g. GZIP or SNAPPY or a dict
like {"col1": "SNAPPY", "col2": None} to specify per column
compression types. In both cases, the compressor settings would be the
underlying compressor defaults. To pass arguments to the underlying
compressor, each dict entry should itself be a dictionary:
{
col1: {
"type": "LZ4",
"args": {
"compression_level": 6,
"content_checksum": True
}
},
col2: {
"type": "SNAPPY",
"args": None
}
"_default": {
"type": "GZIP",
"args": None
}
}
没有提到 zstandard。更糟糕的是,如果我写
fastparquet.write('outfile.parq', df, compression='LZ4')
弹出错误说
Compression 'LZ4' not available. Options: ['GZIP', 'UNCOMPRESSED']
所以fastparquest
只支持'GZIP'?这与项目页面有很大的差异!我错过了一些包裹吗?如何将 fastparquest 与所有项目页面规定的压缩算法一起使用?
是的,您可能缺少一些包裹。您的系统必须首先具有 python LZ4 and/or zstandard 绑定。有关详细信息,请参阅 the source code。
对于 LZ4:如果 import lz4.block
给出 ModuleNotFoundError
,则继续安装 pip install lz4
。
与 zstandard 类似:pip install zstandard
对于 brotli:pip install brotlipy
和lzo:pip install python-lzo
而且活泼:pip install python-snappy
根据project page of fastparquet,fastparquet
支持各种压缩方式
Optional (compression algorithms; gzip is always available):
snappy (aka python-snappy) lzo brotli lz4 zstandard
尤其是 zstandard
是一种现代算法,可提供高压缩比以及令人印象深刻的快速 compression/decompression 速度。这就是我想要的 fastparquet。
compression to apply to each column, e.g. GZIP or SNAPPY or a dict like {"col1": "SNAPPY", "col2": None} to specify per column compression types. In both cases, the compressor settings would be the underlying compressor defaults. To pass arguments to the underlying compressor, each dict entry should itself be a dictionary:
{ col1: { "type": "LZ4", "args": { "compression_level": 6, "content_checksum": True } }, col2: { "type": "SNAPPY", "args": None } "_default": { "type": "GZIP", "args": None } }
没有提到 zstandard。更糟糕的是,如果我写
fastparquet.write('outfile.parq', df, compression='LZ4')
弹出错误说
Compression 'LZ4' not available. Options: ['GZIP', 'UNCOMPRESSED']
所以fastparquest
只支持'GZIP'?这与项目页面有很大的差异!我错过了一些包裹吗?如何将 fastparquest 与所有项目页面规定的压缩算法一起使用?
是的,您可能缺少一些包裹。您的系统必须首先具有 python LZ4 and/or zstandard 绑定。有关详细信息,请参阅 the source code。
对于 LZ4:如果
import lz4.block
给出ModuleNotFoundError
,则继续安装pip install lz4
。与 zstandard 类似:
pip install zstandard
对于 brotli:
pip install brotlipy
和lzo:
pip install python-lzo
而且活泼:
pip install python-snappy