在带有 cPickle 的 python 3.7 上使用 python 2.7 代码时出现 UnicodeDecodeError

UnicodeDecodeError when using python 2.7 code on python 3.7 with cPickle

我正在尝试在从 "parsed" .csv 文件构建的 .pkl 文件上使用 cPickle。解析是使用预先构建的 python 工具箱进行的,该工具箱最近已从 python 2 (https://github.com/GEMScienceTools/gmpe-smtk)

移植到 python 3

我使用的代码如下:

from smtk.parsers.esm_flatfile_parser import ESMFlatfileParser
parser=ESMFlatfileParser.autobuild("Database10","Metadata10","C:/Python37/TestX10","C:/Python37/NorthSea_Inc_SA.csv")
import cPickle
sm_database = cPickle.load(open("C:/Python37/TestX10/metadatafile.pkl","r"))

它returns出现以下错误:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 44: character maps to <undefined>

据我所知,我需要指定 .pkl 文件的编码以使 cPickle 能够工作,但我不知道解析 .csv 文件生成的文件的编码是什么,所以我目前无法使用 cPickle 这样做。

我用sublime text软件发现是"hexadecimal",但这不是Python 3.7中接受的编码格式是吗?

如果有人知道如何确定所需的编码格式,或者如何使十六进制编码在 Python 3.7 中可用,我们将不胜感激。

P.s。 "ESMFlatfileparser" 等使用的模块是预构建工具箱的一部分。考虑到这一点,我是否有可能需要在此模块中以某种方式更改编码?

代码以 text 模式('r')打开文件,但它应该是 binary 模式('rb').

来自 documentationpickle.load(强调我的):

[The] file can be an on-disk file opened for binary reading, an io.BytesIO object, or any other custom object that meets this interface.

由于文件是以二进制模式打开的,因此无需向 open 提供编码参数。可能需要为 pickle.load 提供编码参数。来自同一文档:

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects. Using encoding='latin1' is required for unpickling NumPy arrays and instances of datetime, date and time pickled by Python 2.

这应该可以防止 UnicodeDecodeError:

sm_database = cPickle.load(open("C:/Python37/TestX10/metadatafile.pkl","rb"))