如何通过 Python 中的 ASCII 字符串恢复 Unicode 字符串？

Question

在提问之前我想举个例子。

u_string = u'\xcb\xa5\xb5'
u_string
Out[79]: 'Ë¥µ'
asc_string = ascii(u_string)
asc_string
Out[81]: "'\xcb\xa5\xb5'"

在这里，我终于得到了一个只包含ascii字符的ascii字符串(asc_string)。

我的问题是，如果我只有 asc_string，如何将其转换为原始 u_string（Unicode 字符串）？

谢谢马丁

Answer 1

你可以这样解码：

asc_string.encode().decode( 'unicode-escape' )  
# "'Ë¥µ'"

我不知道为什么，但 ascii 添加了一组额外的引号，您可以像这样删除它们：

asc_string.encode().decode( 'unicode-escape' )[1:-1]
# 'Ë¥µ'

Answer 2

对于这种情况，最简单的完全正确的方法是 ast.literal_eval:

>>> import ast
>>> origversion = u'\xcb\xa5\xb5'  # Leading u is unnecessary on Python 3
>>> asciiform = ascii(origversion)
>>> origversion == ast.literal_eval(asciiform)
True

之所以有效，是因为在字符串上使用 ascii 会添加引号和转义符，使字符串包含可重现原始字符串的字符串文字（它只是 repr，但坚持仅使用 ASCII repr) 中的字符； ast.literal_eval 旨在解析规范的 reprs（ASCII 编码或非编码）文字以生成结果对象，在本例中为字符串。

如何通过 Python 中的 ASCII 字符串恢复 Unicode 字符串？

How to recover a Unicode string by its ASCII string in Python?

python

unicode

ascii

python-unicode