为什么在 unicode 字符串上使用 difflib 后我得到 KeyError

Question

我尝试使用 difflib 来比较单词和句子（在这种情况下类似于字典），当我尝试将 difflib 输出与字典中的键进行比较时，我得到了 KeyError。谁能向我解释为什么会这样？当我不使用 difflib 时一切正常。

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import difflib
import operator

lst = ['król']
word = 'król'

dct = {}
for order in lst:
    word_match_ratio = difflib.SequenceMatcher(None, word, order).ratio()

    dct[order] = word_match_ratio
    print order
    print('%s %s' % (order, word_match_ratio))


sorted_matching_words = sorted(dct.items(), key=operator.itemgetter(1))
sorted_matching_words = str(sorted_matching_words.pop()[:1])
x = len(sorted_matching_words) - 3
word = sorted_matching_words[3:x]

print word


def translate(someword):
    someword = trans_dct[someword]
    print(someword)
    return someword

trans_dct = {
    "król": 'king'
}
print trans_dct
word = translate(word)

预期输出：king

而不是我得到：

Traceback (most recent call last):
  File "D:/Python/Testing stuff.py", line 64, in <module>
    word = translate(word)
  File "D:/Python/Playground/Testing stuff.py", line 56, in translate
    someword = trans_dct[someword]
KeyError: 'kr\xf3l'

我不明白为什么会发生这种情况，看起来 difflib 正在做一些奇怪的事情，因为当我做这样的事情时：

uni = 'kr\xf3l'
print uni


def translate(word):
    word = dct1[word]
    print(word)
    return word

dct1 = {
    "król": 'king'
}
print dct1
word = translate('kr\xf3l')
print word

一切正常。

Answer 1

问题不在于 difflib，而在于提取 word:

sorted_matching_words = sorted(dct.items(), key=operator.itemgetter(1))
# sorted_matching_words = (u'kr\xf3l',)

sorted_matching_words = str(sorted_matching_words.pop()[:1])
# sorted_matching_words = "(u'kr\xf3l',)"

x = len(sorted_matching_words) - 3
word = sorted_matching_words[3:x]
# word = 'kr\xf3l'

你不应该转换 sorted_matching_words 因为它是一个元组。每个元组元素都使用 __repr__ 方法转换为字符串，这就是它转义 \ 的原因。您应该只取第一个元组元素：

In [34]: translate(sorted_matching_words[-1][0])
king
Out[34]: u'king'

为什么在 unicode 字符串上使用 difflib 后我得到 KeyError

Why after using difflib on unicode string I get KeyError

python

unicode

dictionary

difflib

python-2.7