如何通过 python2 cPikle 和 python3 pickle 读取序列化数据?
How to read serialized data by python2 cPikle with python3 pickle?
我正在尝试使用 CIFAR-10 dataset which contains a special version for python。
它是一组二进制文件,每个文件代表一个包含 10k 个 numpy 矩阵的字典。这些文件显然是由 python2 cPickle
.
创建的
我尝试从 python2 加载它,如下所示:
import cPickle
with open("data/data_batch_1", "rb") as f:
data = cPickle.load(f)
这真的很棒。但是,如果我尝试从 python3 加载数据(没有 cPickle
而是 pickle
),它会失败:
import pickle
with open("data/data_batch_1", "rb") as f:
data = pickle.load(f)
如果失败并出现以下错误:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)
我能否以某种方式将原始数据集转换为可从 python3 读取的新数据集?或者我可以以某种方式直接从 python3 读取它吗?
我尝试通过 cPickle
加载它,将其转储到 json
并通过 pickle
读回,但是 numpy 矩阵显然不能写成 json 文件.
您需要告诉 pickle 对这些字节串使用什么编解码器,或者告诉它以 bytes
的形式加载数据。来自 pickle.load()
documentation:
The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.
将字符串加载为 bytes
个对象:
import pickle
with open("data/data_batch_1", "rb") as f:
data = pickle.load(f, encoding='bytes')
我正在尝试使用 CIFAR-10 dataset which contains a special version for python。
它是一组二进制文件,每个文件代表一个包含 10k 个 numpy 矩阵的字典。这些文件显然是由 python2 cPickle
.
我尝试从 python2 加载它,如下所示:
import cPickle
with open("data/data_batch_1", "rb") as f:
data = cPickle.load(f)
这真的很棒。但是,如果我尝试从 python3 加载数据(没有 cPickle
而是 pickle
),它会失败:
import pickle
with open("data/data_batch_1", "rb") as f:
data = pickle.load(f)
如果失败并出现以下错误:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)
我能否以某种方式将原始数据集转换为可从 python3 读取的新数据集?或者我可以以某种方式直接从 python3 读取它吗?
我尝试通过 cPickle
加载它,将其转储到 json
并通过 pickle
读回,但是 numpy 矩阵显然不能写成 json 文件.
您需要告诉 pickle 对这些字节串使用什么编解码器,或者告诉它以 bytes
的形式加载数据。来自 pickle.load()
documentation:
The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.
将字符串加载为 bytes
个对象:
import pickle
with open("data/data_batch_1", "rb") as f:
data = pickle.load(f, encoding='bytes')