如何将 Python 2.x 和 3.x 的字符串编码为 utf-8

Question

我正在尝试将包含西里尔符号（在 utf-8 中）的格式化字符串写入 Unix 管道：

sort_proc.stdin.write("{}\n".format(cyrillic_text).decode('utf-8').encode('utf-8'))

我必须编码，因为 'str' does not support the buffer interface，解码，因为 'ascii' codec can't decode byte 0xd0。所以这段代码按预期在 Python 2.7 中工作。但是 Python 3.4 说 'str' object has no attribute 'decode' 因为 python3 中的字符串文字已经是 "decoded"。所以我知道如何分别为每个版本修复它，但不知道如何为两个版本修复。我找到了一个与重新加载 sys 模块和设置 setdefaultencoding 相关的解决方案，但是这篇文章 why should we NOT use sys.setdefaultencoding 说这只是一个 hack，根本不应该使用。请 post 用最 pythonic 的方式来做这些事情。谢谢。

Answer 1

在整个 Python 2.x 代码中使用 unicode strings（而不是 8 位 str）。这相当于 Python 3.x str 类型。然后，您可以简单地使用 the_string.encode('UTF-8') 来获取字节串（类型 str in 2.x，但类型 bytes in 3.x）。

如果您不需要支持 Python 3.0 到 3.2，您可以在所有字符串文字前加上 u 前缀。在 Python 2.x 中，这会创建一个 unicode 字符串，而在 3.3+ it's supported for backwards compatibility 中却什么都不做。

如何将 Python 2.x 和 3.x 的字符串编码为 utf-8

How to encode string into utf-8 for both Python 2.x and 3.x

python

encode