将 unicode 代码点转换为 Python 中的字母字符串

Question

我有一个 unicode 代码点列表，例如

<U+041D><U+041C><U+0418><U+0426> <U+041D><U+0435><U+0439><U+0440><U+043E><U+0445><U+0438><U+0440><U+0443><U+0440><U+0433><U+0438><U+0438> <U+0438><U+043C>. <U+041D>,<U+041D>.<U+0411><U+0443><U+0440><U+0434><U+0435><U+043D><U+043A><U+043E>

如何将它们转换为常规字符串 - 它们代表特定的字符串，例如 'hello'

Answer 1

使用正则表达式来识别 unicode 点。从每个点中提取十六进制数，将其转换为十进制，并映射到具有该代码的字符：

import re

def decoder(match):
    code = match.group(1) # The code in hex
    return chr(int(code, 16))

uni = # Your string
re.sub("<U\+([0-9a-fA-F]+)>", decoder, uni)
#'НМИЦ Нейрохирургии им. Н,Н.Бурденко'

将 unicode 代码点转换为 Python 中的字母字符串

Convert unicode code points to alphabetical strings in Python

unicode

utf-8

python-3.x