如何将字符串中的 "letter emojis" 转换为常规字母（英文）？在 python

Question

我想知道如何在 Python 中的字符串中将 "letter emojis" 如 '' 和 '' 转换为它们的常规字母字体（英文字母的原始 ASCII）。

例如： '' 应该变成 'Rotem' 并且 'ëⓁᴏ' 应该变成 'HeLlo' 等...

感谢大家的回答:)

Answer 1

很难做到所有情况。

我的尝试：

import unicodedata

s = ''
s = 'ëⓁᴏ'

def normalize_compatibily(s):
    return unicodedata.normalize('NFKD', s)

def remove_accents(s):
    return ''.join(c for c in unicodedata.normalize('NFD', s)
                   if unicodedata.category(c).startswith('L'))

print(s)
s = normalize_compatibily(s)
print(s)
s = remove_accents(s)
print(s)

解决一些案例。

不幸的是，在 U+1D0F LATIN LETTER SMALL CAPITAL O 的情况下，Unicode 数据库中没有数据可以帮助我们。并非如此，Unicode 名称 unicodedata.name() 可能会有所帮助，例如使用正则表达式，但这意味着要查找所有相似的字符，如果字母不在名称中，它将失败。

但是关于 confusable characters 还有一个 table（不在主数据库上），所以有一个 Python 库： https://pypi.org/project/confusables/，见最后一个例子。

您可能需要混合使用这两种方法，并最终添加一些既不那么容易混淆也不相关但无论如何用于替换其他字符的新字符。

如何将字符串中的 "letter emojis" 转换为常规字母（英文）？在 python

How can I convert "letter emojis" in a string to regular letters (english)? in python

python

string

utf-8

emoji

lib