为什么 `toml.load(f)` 在 Windows 下（而不是 Linux 下）对该文件失败？

Question

我有一个TOML file which I want to process with this script.

这曾经在 Linux 下运行良好。在 Windows (Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:23:52) [MSC v.1900 32 bit (Intel)] on win32) 下，我收到以下错误：

Need to process 1 file(s)
Processing file test01.toml (1 of 1)
Traceback (most recent call last):
  File "py/process.py", line 27, in <module>
    add_text_fragment(input_dir + "/" + file)
  File "<string>", line 10, in add_text_fragment
  File "C:\Users\Anaconda3\lib\site-packages\toml\decoder.py", line 134, in lo
ad
    return loads(f.read(), _dict, decoder)
  File "C:\Users\Anaconda3\lib\encodings\cp1251.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 985: char
acter maps to <undefined>

我假设错误发生在此处：

f = open(toml_file_name, "r")
pt = toml.load(f)
f.close()

根据 NotePad++，有问题的 file 具有 UTF-8 编码。

我该如何解决？

赏金条款

我会将这笔赏金奖励给向我展示如何确保脚本 process.py correctly processes the input file, i. e. the execution gets past the comment starting with If at this point pt in addTextFragment.py

的人

def add_text_fragment(toml_file_name):
    f = open(toml_file_name, "r")
    pt = toml.load(f)
    f.close()

    # If at this point pt contains dthe data of the input file,
    # then you have attained the goal.
    if (pt["type"] == "TA"):

并且变量 pt 包含来自 input file 的数据。

您的解决方案必须在 Windows 10、Python 3.7.6 (default, Jan 8 2020, 20:23:39) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32.

下工作

注意：process.py 对特定目录中的所有文件执行 addTextFragment.py。

Answer 1

toml 似乎试图解码您的数据但失败了。正如您所说，您的 toml 文件中的数据是 UTF-8 编码的。我会手动解码它以避免在 toml 库中检测到可能的字符集。

with open(toml_file_name, 'rb') as f:
    pt = toml.loads(f.read().decode('utf-8'))

Answer 2

只需替换这一行：

f = open(toml_file_name, "r")

与：

f = open(toml_file_name, "r", encoding="utf-8")

如您在错误消息中所见，Python 正在尝试使用文件的默认系统编码读取文件 - 如果文件包含任何非 ASCII 字符并在 Linux 中工作，这意味着它具有不同的编码 - 所有非 Windows 世界的默认编码是 utf-8。

为什么 `toml.load(f)` 在 Windows 下（而不是 Linux 下）对该文件失败？

Why does `toml.load(f)` fail with this file under Windows (but not on Linux)?

python

windows

character-encoding

python-3.x

toml