合并排序单词列表

Merge sort a list of words

我在代码审查中使用 post 实现了归并排序算法。它在整数列表上运行良好,但我认为需要一个更实用的应用程序。所以我下载了一个带有随机英文单词的文本文件并尝试对它们进行排序。

然而它什么也没做。

def merge_sort(seq):
    if len(seq) == 1:
        return seq
    else:
        # recursive step. Break the list into chunks of 1
        mid_index = len(seq) // 2
        left  = merge_sort( seq[:mid_index] )
        right = merge_sort( seq[mid_index:] )

    left_counter, right_counter, master_counter = 0, 0, 0

    while left_counter < len(left) and right_counter < len(right):
        if left[left_counter] < right[right_counter]:
            seq[master_counter] = left[left_counter]
            left_counter += 1
        else:
            seq[master_counter] = right[right_counter]
            right_counter += 1

        master_counter += 1

    # Handle the remaining items in the remaining_list
    # Either left or right is done already, so only one of these two
    #    loops will execute

    while left_counter < len(left):  # left list isn't done yet
        seq[master_counter] = left[left_counter]
        left_counter   += 1
        master_counter += 1

    while right_counter < len(right):  # right list isn't done yet
        seq[master_counter] = right[right_counter]
        right_counter   += 1
        master_counter  += 1

    return seq

我认为问题在于它处理的是列表列表而不是单个列表。该函数也无法知道排序的基础是什么。对吗?

这是我想要的称呼方式

with open('words.txt') as f:
    list_of_words = f.read().splitlines()
    new = merge_sort(list_of_words)
    print(new == sorted(list_of_words, key=len))

正如您所指出的那样,问题是 merge_sort 无法知道排序的基础。您可以更改 merge_sort 以接受一个附加参数,该参数 returns 是序列中每个元素的键,就像 sorted 所做的那样:

def merge_sort(seq, key=lambda x: x):

然后将比较更改为调用传递的函数而不是直接比较元素:

if key(left[left_counter]) < key(right[right_counter]):
    seq[master_counter] = left[left_counter]
    left_counter += 1
else:
    seq[master_counter] = right[right_counter]
    right_counter += 1

最后将密钥传递给递归调用:

left  = merge_sort( seq[:mid_index], key )
right = merge_sort( seq[mid_index:], key )

通过这些更改,它将按您预期的方式工作:

merge_sort([4, 6, 2, 1]) # [1, 2, 4, 6]
merge_sort(['foo', 'a', 'bar', 'foobar'], key=len) # ['a', 'bar', 'foo', 'foobar']

但要注意的一件事是结果与 sorted 不同,因为 merge_sort 不是 stable:

merge_sort(['foo', 'a', 'bar', 'foobar'], key=len) # ['a', 'bar', 'foo', 'foobar']
sorted(['foo', 'a', 'bar', 'foobar'], key=len) # ['a', 'foo', 'bar', 'foobar']