为什么 Windows 有编码问题,而 Linux 没有?

Why does Windows have issues with the encoding, but Linux doesn't?

对于我的玩具项目 mpu 我有两个 CI 解决方案 运行:

失败并显示此消息:

_______________________________ test_read_json ________________________________

    def test_read_json():
        path = "files/example.json"
        source = pkg_resources.resource_filename(__name__, path)
        data_real = read(source)
    
        data_exp = {
            "a list": [1, 42, 3.141, 1337, "help", "�"],
            "a string": "bla",
            "another dict": {"foo": "bar", "key": "value", "the answer": 42},
        }
>       assert data_real == data_exp
E       AssertionError: assert {'a list': [1... answer': 42}} == {'a list': [1... answer': 42}}
E         Omitting 2 identical items, use -vv to show
E         Differing items:
E         {'a list': [1, 42, 3.141, 1337, 'help', '€']} != {'a list': [1, 42, 3.141, 1337, 'help', '�']}
E         Use -v to get the full diff

tests\test_io.py:175: AssertionError

为什么它可以从 JSON 中读取 € 符号,但在测试中却失败了? (Python 3.6)

我假设测试中使用的 read 函数以某种方式包装 open

TL;DR 尝试将 encoding='utf8' 添加到对 open.

的调用中

根据我的经验,除非明确设置编码,否则 Windows 在读取文件时并不总是能很好地处理非 ASCII 字符。

此外,the default value for encoding is platform-dependent:

也无济于事

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.

一些测试(运行 Win 10,Python 3.7,locale.getpreferredencoding() returns cp1262):

test.csv


with open('test.csv') as f:
    print(f.read())

# €

with open('test.csv', encoding='utf8') as f:
    print(f.read())

# '€'