如何正确打印出带有unicode转义字符的字符串

Question

我正在从包含嵌入式 unicode 转义序列的文件中读取字符串， \u00e9 为例。当我使用 print() 打印文字字符串时，编码被打印转换为正确的字符，但是如果我从标准输入中获取字符串并将它们打印出来，打印不会将转义序列转换为 unicode 字符。

例如，当我使用：

print ("Le Condamn\u00e9 \u00e0 mort")

python 正确打印 Le Condamné à mort 但是，如果我从 stdin 获得相同的字符串，我会得到：Le Condamn\u00e9 \u00e0 mort

有谁知道如何让 python 将转义序列转换为正确的 unicode 字符？另外，为什么 print 在给它一个字符串文字而不是一个字符串变量时表现不同？

Answer 1

\u00e0 被存储为 python 的 Unicode 数字，因此它被打印为 'à'。当您从另一个文件中获取它时，它完全是字符串形式，这意味着它会存储为 '\u00e0'，其中每个字符都是一个字符串。一个解决方案是确定 '\u00e0' 在列表中的位置，然后将其替换为 '\u00e0'

这里有一些代码可以将字符串中的 '\u00e0' 转换成它应该是的字符。

def special_char_fix(string):
    string = list(string)
    for pl, char in enumerate(string):
        if char == '\':
            val = ''.join([string[pl + k + 2] for k in range(4)])
            for k in range(5):
                string.pop(pl)
            string[pl] = str(chr(int(val, 16)))
    return ''.join(string)

Answer 2

我相信您正在寻找 str.encode("string-escape") 函数

示例代码

s = "Le Condamn\u00e9 \u00e0 mor"
ra=s.encode('unicode_escape').decode()
print(ra)

输出

Le Condamn\xe9 \xe0 mor

the image contains the code snippet with output

如何正确打印出带有unicode转义字符的字符串

How to print out strings with unicode escape characters correctly

python

string

translation