将 utf-8 字符转换为 scandic 字母

Question

我正在努力尝试对 scandic 字母采用 utf-8 格式的字符串进行编码。例如，我想转换以下细绳： test_string = "\xc3\xa4\xc3\xa4abc" 转化为： test_string = "ääabc" 最终目标是通过 API 将此字符串发送到 Slack-channel。我做了一些测试，发现 Slack 可以正确处理 scandic 字母。我尝试了以下命令： test_string= test_string.encode('latin1').decode('utf-8') 但这根本不会改变字符串。

同样适用于更暴力的方法：

def simple_scand_convert(string):
   string = string.replace("\xc3\xa4", "ä")

同样，这根本不会改变字符串。我可以从哪里寻找解决方案的任何提示或材料？

Answer 1

根据原问题和评论中的讨论，我怀疑您只是没有保存转换结果。 Python 字符串是不可变的，因此仅更改传递给函数的字符串不会对原始字符串做任何事情：

In [42]: def change_string(s):
    ...:     s = "hello world"
    ...:
    ...: test_s = "still here"
    ...: change_string(test_s)
    ...: print(test_s)
still here

相反，您需要 return 函数中的转换结果并重新分配变量：

In [43]: def change_string(s):
    ...:     s = s.encode('latin1').decode('u8')
    ...:     return s
    ...:
    ...: test_s = "\xc3\xa4\xc3\xa4abc"
    ...: test_s = change_string(test_s)
    ...: print(test_s)
ääabc

Answer 2

我无法重现您的从传入的 webhook 读取汤消息 代码片段；因此，我的答案基于硬编码数据，并详细说明了 Python specific text encodings raw_unicode_escape and unicode_escape 的工作原理：

test_string = "\xc3\xa5\xc3\xa4___\xc3\xa5\xc3\xa4"    # hard-coded
print('test_string                  ', test_string)
print('.encode("raw_unicode_escape")',
  test_string.encode( 'raw_unicode_escape'))
print('.decode(    "unicode_escape")',
  test_string.encode( 'raw_unicode_escape').decode( 'unicode_escape'))
print('.encode("latin1").decode()   ', 
  test_string.encode( 'raw_unicode_escape').decode( 'unicode_escape').
              encode( 'latin1').decode( 'utf-8'))

输出：\SO069394.py

test_string                   \xc3\xa5\xc3\xa4___Ã¥Ã¤
.encode("raw_unicode_escape") b'\xc3\xa5\xc3\xa4___\xc3\xa5\xc3\xa4'
.decode(    "unicode_escape") Ã¥Ã¤___Ã¥Ã¤
.encode("latin1").decode()    åä___åä

将 utf-8 字符转换为 scandic 字母

Converting utf-8 characters to scandic letters

python

string

encoding

utf-8

latin