将重音与他们的字母分开

Seperate accents from their letters

我正在寻找一个函数,它将接受一个复合字母并将其拆分,就像您必须在 US-INTL 键盘上键入它一样,如下所示:

'ȯ' becomes ".o"
'â' becomes "^a"
'ë' becomes "\"e"
'è' becomes "`e"
'é' becomes "'e"
'ñ' becomes "~n"
'ç' becomes ",c"

等等

但是在搜索这个问题时,我只能找到完全删除重音的函数,这不是我想要的。

这是我想要完成的:

展开这个字符串:

ër íí àha lá eïsch

进入这个字符串:

"er 'i'i `aha l'a e"isch

您可以使用字典将字符与其替换匹配,然后遍历字符串以进行实际替换。

word_rep = dict(zip(['ȯ','â','ë','è','é','ñ','ç'] 
['.o','^a','\"e','`e','\'e','~n',',c']))
mystr = 'ër íí àha lá eïsch'
for key,value in word_rep.items():
    mystr = mystr.replace(key,value)

下面使用 Unicode 分解将组合标记与拉丁字母分开,使用正则表达式交换组合字符及其字母,然后翻译 table 将组合标记转换为国际键盘上使用的键:

import unicodedata as ud
import re

replacements = {'\N{COMBINING DOT ABOVE}':'.',
                '\N{COMBINING CIRCUMFLEX ACCENT}':'^',
                '\N{COMBINING DIAERESIS}':'"',
                '\N{COMBINING GRAVE ACCENT}':'`',
                '\N{COMBINING ACUTE ACCENT}':"'",
                '\N{COMBINING TILDE}':'~',
                '\N{COMBINING CEDILLA}':','}

combining = ''.join(replacements.keys())
typing = ''.join(replacements.values())

translation = str.maketrans(combining,typing)

s = 'ër íí àha lá eïsch'
s = ud.normalize('NFD',s)
s = re.sub(rf'([aeiounc])([{combining}])',r'',s)
s = s.translate(translation)
print(s)

输出:

"er 'i'i `aha l'a e"isch