Python 2.7：从 <str> 中删除重音 - 仍然无效

Question

我正在通过 websocket 连接接收 str 数据，并尝试删除重音（以及降低文本/将“”变成“-”..）

而我，即使在这里提出并回答了数百个问题，仍然失败了。

这是尝试执行此操作的代码部分，已解析[4][7:] 成为我要翻译的文本

    if parsed[4][:6]=="!strat":
        shiftedtxt=''
        txt=parsed[4][7:].lower().decode('unicode-escape')
        hope=''.join((c for c in unicodedata.normalize('NFD', txt) if unicodedata.category(c) != 'Mn'))
        for i in hope:
            if i==' ':
                shiftedtxt+='-'
            else:
                shiftedtxt+=i
        ws.send(room+"|http://pokestrat.com/fiche_pokemon/"+shiftedtxt+".php")

我通常会尝试将 'Ténéfix' 翻译成 'tenefix'。

根据本网站答案的建议，我正在使用

''.join((c for c in unicodedata.normalize('NFD', txt) if unicodedata.category(c) != 'Mn'))

方法。

逐步尝试时:

'éô'.decode('unicode-escape')

产量

u'\xe9\xf4'

和

>>> s=u'\xe9\xf4'
>>> ''.join((c for c in unicodedata.normalize('NFD', s) if     unicodedata.category(c) != 'Mn'))

产量

u'eo'

因此，事情应该..有效？然而他们没有。例如，'ténéfix' returns 'tA©nA©fix'，我无法解释。为什么？

编辑：完整代码如下：http://pastebin.com/aJ1Rk1pV

Answer 1

txt=parsed[4][7:].lower().decode('unicode-escape')

您确定要将部分提交的文本解析为 Python unicode 字符串文字吗？这似乎不太可能。 !strat Ténéfix 不包含任何 Python 字符串转义（如 \uNNNN、\n 等）。

txt=parsed[4][7:].decode('utf-8').lower()

将字节序列转换为 Unicode 文本后需要进行小写。

When trying step by step: 'éô'.decode('unicode-escape') yields u'\xe9\xf4'

要实现这一点，您的终端必须将字符 éô 作为 ISO-8859-1（或类似的 Windows 代码页 1252）发送。这是与 websocket 为您提供的 UTF-8 不同的编码，因此结果不同。

Python 2.7：从 <str> 中删除重音 - 仍然无效

Python 2.7 : Removal of accents from an <str> - Still won't work

python

string

unicode

utf-8

python-2.7