在 Python 2 和 Python 3 中有效的原始 unicode 文字？

Question

显然 ur"" 语法在 Python 3 中已被禁用。但是，我需要它！ "Why?"，你可能会问。嗯，我需要 u 前缀，因为它是一个 unicode 字符串，我的代码需要在 Python 上工作 2. 至于 r 前缀，也许它不是必需的，但标记格式我正在使用需要很多反斜杠，这将有助于避免错误。

这是一个在 Python 2 中执行我想要的操作但在 Python 3 中是非法的示例：

tamil_letter_ma = u"\u0bae"
marked_text = ur"\a%s\bthe Tamil\cletter\dMa\e" % tamil_letter_ma

遇到这个问题后，我找到了 http://bugs.python.org/issue15096 并注意到了这句话：

It's easy to overcome the limitation.

有没有人愿意提供一些想法？

_{相关：What exactly do "u" and "r" string flags do in Python, and what are raw string literals?}

Answer 1

Unicode 字符串是 Python 3.x 中的默认字符串，因此单独使用 r 将产生与 Python 中的 ur 相同的结果 2.

Answer 2

为什么不直接使用原始字符串文字 (r'....')，你不需要指定 u 因为在 Python 3 中，字符串是 unicode 字符串。

>>> tamil_letter_ma = "\u0bae"
>>> marked_text = r"\a%s\bthe Tamil\cletter\dMa\e" % tamil_letter_ma
>>> marked_text
'\aம\bthe Tamil\cletter\dMa\e'

为了使其在Python 2.x中也能工作，请在源代码的最开头添加the following Future import statement，以便源代码中的所有字符串文字都变成unicode。

from __future__ import unicode_literals

Answer 3

首选方法是删除 u'' 前缀并使用 from __future__ import unicode_literals 作为。但是在您的特定情况下，您可能会滥用 "ascii-only string" % unicode returns Unicode:

>>> tamil_letter_ma = u"\u0bae"
>>> marked_text = r"\a%s\bthe Tamil\cletter\dMa\e" % tamil_letter_ma
>>> marked_text
u'\a\u0bae\bthe Tamil\cletter\dMa\e'

在 Python 2 和 Python 3 中有效的原始 unicode 文字？

Raw unicode literal that is valid in Python 2 and Python 3?

python

unicode

python-2.x

python-3.x