请求中的 Unicode 支持

Question

我正在使用 requests 库从 URL 中读取文本文件，例如：

import requests
response = requests.get('https://example.com/file.txt')
result = list(response.text.splitlines())

但是文件的某些行包含 Unicode 字符（斯拉夫符号）。然后我不能正确使用这个字符串，因为它在读取后看起来已损坏。对于这种情况，正确的做法是什么？

Answer 1

response.text 由 response.encoding 提供支持，在没有正确的 Content-type: ...; charset=... header 的情况下，由 chardet 或 charset_normalizer 模块在内部由请求（并公开为 response.apparent_encoding）。（据我所知，这种猜测将在 Requests 的更高版本中被弃用。）

有时自动检测失败，你会得到乱码（就像你做的那样），你需要自己弄清楚编码是什么，或者：

做response.content.decode('some-encoding')，或者
设置response.encoding = 'some-encoding'，之后response.text将使用该编码。

请求中的 Unicode 支持

Unicode support in requests

python

unicode

request