将重音与他们的字母分开
Seperate accents from their letters
我正在寻找一个函数,它将接受一个复合字母并将其拆分,就像您必须在 US-INTL 键盘上键入它一样,如下所示:
'ȯ' becomes ".o"
'â' becomes "^a"
'ë' becomes "\"e"
'è' becomes "`e"
'é' becomes "'e"
'ñ' becomes "~n"
'ç' becomes ",c"
等等
但是在搜索这个问题时,我只能找到完全删除重音的函数,这不是我想要的。
这是我想要完成的:
展开这个字符串:
ër íí àha lá eïsch
进入这个字符串:
"er 'i'i `aha l'a e"isch
您可以使用字典将字符与其替换匹配,然后遍历字符串以进行实际替换。
word_rep = dict(zip(['ȯ','â','ë','è','é','ñ','ç']
['.o','^a','\"e','`e','\'e','~n',',c']))
mystr = 'ër íí àha lá eïsch'
for key,value in word_rep.items():
mystr = mystr.replace(key,value)
下面使用 Unicode 分解将组合标记与拉丁字母分开,使用正则表达式交换组合字符及其字母,然后翻译 table 将组合标记转换为国际键盘上使用的键:
import unicodedata as ud
import re
replacements = {'\N{COMBINING DOT ABOVE}':'.',
'\N{COMBINING CIRCUMFLEX ACCENT}':'^',
'\N{COMBINING DIAERESIS}':'"',
'\N{COMBINING GRAVE ACCENT}':'`',
'\N{COMBINING ACUTE ACCENT}':"'",
'\N{COMBINING TILDE}':'~',
'\N{COMBINING CEDILLA}':','}
combining = ''.join(replacements.keys())
typing = ''.join(replacements.values())
translation = str.maketrans(combining,typing)
s = 'ër íí àha lá eïsch'
s = ud.normalize('NFD',s)
s = re.sub(rf'([aeiounc])([{combining}])',r'',s)
s = s.translate(translation)
print(s)
输出:
"er 'i'i `aha l'a e"isch
我正在寻找一个函数,它将接受一个复合字母并将其拆分,就像您必须在 US-INTL 键盘上键入它一样,如下所示:
'ȯ' becomes ".o"
'â' becomes "^a"
'ë' becomes "\"e"
'è' becomes "`e"
'é' becomes "'e"
'ñ' becomes "~n"
'ç' becomes ",c"
等等
但是在搜索这个问题时,我只能找到完全删除重音的函数,这不是我想要的。
这是我想要完成的:
展开这个字符串:
ër íí àha lá eïsch
进入这个字符串:
"er 'i'i `aha l'a e"isch
您可以使用字典将字符与其替换匹配,然后遍历字符串以进行实际替换。
word_rep = dict(zip(['ȯ','â','ë','è','é','ñ','ç']
['.o','^a','\"e','`e','\'e','~n',',c']))
mystr = 'ër íí àha lá eïsch'
for key,value in word_rep.items():
mystr = mystr.replace(key,value)
下面使用 Unicode 分解将组合标记与拉丁字母分开,使用正则表达式交换组合字符及其字母,然后翻译 table 将组合标记转换为国际键盘上使用的键:
import unicodedata as ud
import re
replacements = {'\N{COMBINING DOT ABOVE}':'.',
'\N{COMBINING CIRCUMFLEX ACCENT}':'^',
'\N{COMBINING DIAERESIS}':'"',
'\N{COMBINING GRAVE ACCENT}':'`',
'\N{COMBINING ACUTE ACCENT}':"'",
'\N{COMBINING TILDE}':'~',
'\N{COMBINING CEDILLA}':','}
combining = ''.join(replacements.keys())
typing = ''.join(replacements.values())
translation = str.maketrans(combining,typing)
s = 'ër íí àha lá eïsch'
s = ud.normalize('NFD',s)
s = re.sub(rf'([aeiounc])([{combining}])',r'',s)
s = s.translate(translation)
print(s)
输出:
"er 'i'i `aha l'a e"isch