内置解码方法的异常行为（也使用 aiohttp）

Question

所以我试图抓取整个页面。我希望两者都能正常工作。所以这是不起作用的代码：

import aiohttp
import asyncio

url = "https://unsplash.com/s/photos/dogs"

async def main():
    async with aiohttp.ClientSession() as s:
        async with s.get(url) as r:
            enc = str(r.get_encoding())
            bytes = await r.read() <--- returns <class 'bytes'>
            with open("stuff.html", "w") as f:
                f.write(bytes.decode(encoding=enc, errors="ignore")) <-- in errors I've tried all possible accepted values.

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

这导致 UnicodeEncodeError: 'charmap' codec can't encode character '\u2713' in position 58100: character maps to <undefined>。我假设是指定位置的字符，出于某种原因无法解码并转换为字符串。通过将 main 函数修改为以下内容，它可以正常工作。

async def main():
    async with aiohttp.ClientSession() as s:
        async with s.get(url) as r:
            enc = str(r.get_encoding())
            bytes = await r.read()
        with open("stuf.html", "wb") as f:
            f.write(bytes)

我不确定为什么它不起作用。因为在第二个代码块中，我只是使用上下文管理器将字节写入名为 stuff.html 的文件。并在第一个代码块中。我只是用更长的方式用 decode() 方法做同样的事情，很好地解码它并将它变成要写入文件的字符串。所以我不需要用 wb 或 w 等

打开文件

Answer 1

如果在 open() 调用中未设置显式编码，

f.write(string) 在实际写入之前使用系统默认编码将字符串编码为字节。

在 Windows 上，文件系统编码默认为 charmap（参见 locale.getpreferredencoding()）；不是 utf-8。 Charmap 无法对所有 utf-8 字符进行编码，这就是您看到错误的原因。

有关于将 Windows 默认编码切换为 utf-8 的讨论，但该切换增加了向后兼容性问题，因此尚未执行。

当前文件编码状态在Python Docs for Windows中描述。

内置解码方法的异常行为（也使用 aiohttp）

Unusual behavior with the built in decode method (aiohttp is used as well)

python

encoding

python-3.x

aiohttp

python-3.9