按另一个列表的最佳匹配对列表进行排序

Sort list by best match from another list

我正在尝试按另一个列表对列表进行排序,但它们并非 100% 相同。

list1 = ["1 apple","2banana","mango"]
list2 = ["3bana2na 2+", "0.5 mango 1-", "apple2"]

Out: list1_ordered_by_list2 = ["2banana", "mango", "1 apple"]

我很高兴使用 jellyfish.levenshtein_distance() 进行比较,但是我不确定如何将 list1 中的每个元素与 list2 中的每个元素进行比较并返回按 list2 顺序排序的 list1。

值得一提的是,我的2个列表长度相同。但是,更通用的解决方案将非常有价值!

如果 2 个列表的 itmes 数量不同,我可以得到它们之间的映射,则加分。例如

list1 = ["1 apple","2banana","mango"]
list2 = ["3bana2na 2+", "0.5 mango 1-", "apple2", "apple"]

Out: list1_ordered_by_list2 = ["1 apple", "2banana", "mango"]

这可能相当复杂。如果需要进一步说明,请告诉我。 我希望你能帮忙。 谢谢,

你需要创建一个基于jellyfish.levenshtein_distance()的排名函数,其中returns是最小距离的索引,并交给排序。

from jellyfish import levenshtein_distance as ld

list1 = ["1 apple","2banana","mango"]
list2 = ["3bana2na 2+", "0.5 mango 1-", "apple23"]

def rank(x):
    dist = [ld(x, s) for s in list2]
    return dist.index(min(dist))

print(sorted(list1, key=rank))  # --> ['2banana', '1 apple', 'mango']

-- 下面的评论种类 --

请注意,下面的代码显示了实际的 ld 值。我们可以看到

(mango) <-> (apple2)(mango) <-> (0.5 mango 1-)

输出的最后一行显示排序列表中元素的索引。

from jellyfish import levenshtein_distance as ld

list1 = ["1 apple", "2banana", "mango"]
list2 = ["3bana2na 2+", "0.5 mango 1-", "apple2"]
list3 = []
for x in list1:
    offset = 0
    for idx, y in enumerate(list2):
        ld_value = ld(x, y)
        print('({}) <-> ({}) --> {}'.format(x,y,ld_value))
        if idx == 0:
            _min = ld_value
            continue
        else:
            if ld_value < _min:
                _min = ld_value
                offset = idx
    list3.append((x, offset))
    print()
print(list3)

输出

(1 apple) <-> (3bana2na 2+) --> 10
(1 apple) <-> (0.5 mango 1-) --> 10
(1 apple) <-> (apple2) --> 3

(2banana) <-> (3bana2na 2+) --> 5
(2banana) <-> (0.5 mango 1-) --> 10
(2banana) <-> (apple2) --> 7

(mango) <-> (3bana2na 2+) --> 9
(mango) <-> (0.5 mango 1-) --> 7
(mango) <-> (apple2) --> 6

[('1 apple', 2), ('2banana', 0), ('mango', 2)]

使用 Lior 的 rank 函数,您可以使用 difflib 来实现示例输出:

list1 = ["1 apple","2banana","mango"]
list2 = ["3bana2na 2+", "0.5 mango 1-", "apple2", "apple"]

import difflib

def rank(x):
    dist = [len(list(difflib.ndiff(x, s))) for s in list2]
    return dist.index(min(dist))

>>> sorted(list1, key=rank)
['1 apple', '2banana', 'mango']

或者你的第一个例子:

list1 = ["1 apple","2banana","mango"]
list2 = ["3bana2na 2+", "0.5 mango 1-", "apple2"]

>>> sorted(list1, key=rank)
['2banana', '1 apple', 'mango']

对参考列表使用模糊匹配形式可能更快。您可以使用 difflib 中的 regex module or use get_close_matches

list1 = ["1 apple","2banana","mango"]
list2 = ["3bana2na 2+", "0.5 mango 1-", "apple2"]

import difflib

def rank2(s, ref=list2):
    try:
        w=difflib.get_close_matches(s, ref)
        return ref.index(w[0])
    except IndexError:
        return len(ref)+1

>>> sorted(list1, key=rank2)
['2banana', '1 apple', 'mango']