如何通过 python2 cPikle 和 python3 pickle 读取序列化数据?

How to read serialized data by python2 cPikle with python3 pickle?

我正在尝试使用 CIFAR-10 dataset which contains a special version for python

它是一组二进制文件,每个文件代表一个包含 10k 个 numpy 矩阵的字典。这些文件显然是由 python2 cPickle.

创建的

我尝试从 python2 加载它,如下所示:

import cPickle
with open("data/data_batch_1", "rb") as f:
    data = cPickle.load(f)

这真的很棒。但是,如果我尝试从 python3 加载数据(没有 cPickle 而是 pickle),它会失败:

import pickle
with open("data/data_batch_1", "rb") as f:
    data = pickle.load(f)

如果失败并出现以下错误:

UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)

我能否以某种方式将原始数据集转换为可从 python3 读取的新数据集?或者我可以以某种方式直接从 python3 读取它吗?

我尝试通过 cPickle 加载它,将其转储到 json 并通过 pickle 读回,但是 numpy 矩阵显然不能写成 json 文件.

您需要告诉 pickle 对这些字节串使用什么编解码器,或者告诉它以 bytes 的形式加载数据。来自 pickle.load() documentation:

The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.

将字符串加载为 bytes 个对象:

import pickle
with open("data/data_batch_1", "rb") as f:
    data = pickle.load(f, encoding='bytes')