使用带有 unicode 转义的 f 字符串

Question

我有看起来像这样的字符串： a = "testing test<U+00FA>ing <U+00F3>"

格式不会总是那样，但是括号中的那些unicode字符会散布在整个代码中。我想把它们变成它们代表的实际 unicode 字符。我试过这个功能：

def replace_unicode(s):
    uni = re.findall(r'<U\+\w\w\w\w>', s)

    for a in uni:
        s = s.replace(a, f'\u{a[3:7]}')
    return s

这成功找到了所有 unicode 字符串，但它不允许我将它们放在一起以这种方式创建 unicode 转义。

  File "D:/Programming/tests/test.py", line 8
    s = s.replace(a, f'\u{a[3:7]}')
                     ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape

如何使用 f 字符串或通过其他方法使用我从字符串中获取的信息创建 unicode 转义字符？

Answer 1

您可以使用 f 字符串为 int 创建适当的参数，chr 函数可以使用其结果生成所需的字符。

for a in uni:
    s = s.replace(a, chr(int(f'0x{a[3:7]}', base=16)))

Answer 2

很好，但您实际上并不需要 f 弦。 int(a[3:7], base=16) 工作得很好。

此外，使用 re.sub() 比 re.findall() 和 str.replace() 更有意义。我还会将正则表达式限制为十六进制数字并将它们分组。

import re

def replace_unicode(s):
    pattern = re.compile(r'<U\+([0-9A-F]{4})>')
    return pattern.sub(lambda match: chr(int(match.group(1), base=16)), s)

a = "testing test<U+00FA>ing <U+00F3>"
print(replace_unicode(a))  # -> testing testúing ó

使用带有 unicode 转义的 f 字符串

Using f-strings with unicode escapes

python

unicode

python-3.x