Python 默认字符编码处理

Question

我看过几个与此相关的post，但没有明确的答案。假设我想在仅支持 ASCII 的终端（例如 LC_ALL=C; python3）中打印字符串 s=u'\xe9\xe1'。有没有办法将以下配置为默认行为：

import sys
s = u'\xe9\xe1'
s = s.encode(sys.stdout.encoding, 'replace').decode(sys.stdout.encoding)
print(s)

也就是说，我想让字符串打印一些东西——甚至是垃圾——而不是引发异常 (UnicodeEncodeError)。我正在使用 python3.5.

我想避免为所有可能包含 UTF-8 的字符串编写此代码。

Answer 1

您可以做以下三件事之一：

使用 PYTHONIOENCODING environment variable:
调整 stdout 和 stderr 的错误处理程序
```
export PYTHONIOENCODING=:replace
```
注意:；我没有指定编解码器，只指定了错误处理程序。

替换stdoutTextIOWrapper，设置不同的错误处理程序：

import sys
import io

sys.stdout = io.TextIOWrapper(
    sys.stdout.buffer, encoding=sys.stdout.encoding, 
    errors='replace',
    line_buffering=sys.stdout.line_buffering)

围绕 sys.stdout.buffer 创建一个单独的 TextIOWrapper 实例，并在打印时将其作为 file 参数传递：

import sys
import io

replacing_stdout = io.TextIOWrapper(
    sys.stdout.buffer, encoding=sys.stdout.encoding, 
    errors='replace',
    line_buffering=sys.stdout.line_buffering)

print(s, file=replacing_stdout)

Python 默认字符编码处理

Python default character encoding handling

python

utf

character-encoding