如何制作快速的功能归并排序和生成器

How to make a fast functional mergesort and generators

您好,我正在尝试在 python 中创建合并排序算法。我想尽可能地实现它的功能,并尽可能避免循环和赋值。现在我的功能还不够快。

有人告诉我,我必须更改我的拆分函数,因为它让我变慢了,因为它的 运行 时间为 O(n)

我现在无法想象我该如何改变它。我想我应该去掉所有支持 iterators/generators 的列表。但正如我现在所看到的,如果我应该从拆分功能中退出,我必须从根本上改变我的设计。

此外,我想知道我所有的 if else 语句是否都需要太多计算机思考?

另外,我是否可以通过柯里化改进代码?

def split(xs, half):
    return (xs[:half], xs[half:])

def msort(xs):
    length = len(xs)
    if length <= 1:
        yield from xs
    else:
        yield from merge(*(msort(x) for x in split(xs, length//2)))

def merge(xs, ys, x=None, y=None):
    if not y:
        y = get(ys)
    if not x:
        x = get(xs)
    if x and y:
        if x <= y:
            yield x
            yield from merge(xs, ys, y=y)
        else:
            yield y
            yield from merge(xs, ys, x=x)
    else:
        if x:
            yield x
            yield from xs
        if y:
            yield y
            yield from ys

def get(xs):
    try:
        return next(xs)
    except StopIteration:
        return None

你的 split 函数确实是这里的瓶颈,因为它创建了子列表,其中包含它所代表的所有内存开销。如果 msort() 旨在处理迭代器,那么您可以将 split() 更改为 return 两个迭代器(例如,一个用于奇数位置,一个用于偶数位置)。这将消除排序过程中的大部分内存分配和管理。

这是我的想法的一个例子:

from itertools import tee,islice
def split(xs,size):
    i1,i2 = tee(xs)                   # two iterators
    s1,s2 = size//2 + size%2, size//2 # corresponding sizes (evens, odds)
    yield islice(i1,0,None,2),s1      # even positions (iterator,size)
    yield islice(i2,1,None,2),s2      # odd positions  (iterator,size)

def msort(xs,size=None):                       # use iterators when recursing
    if size is None:size = len(xs)             # first call expects a list
    if size == 1: yield from xs; return        # don't split single item
    yield from merge(*(msort(i,s)                   # msort(iterator)
                       for i,s in split(xs,size)))  # split as iterators

def merge(xs, ys, x=None, y=None):
    if not y:
        y = next(ys,None)   # using next(..,default) instead of try/except
    if not x:
        x = next(xs,None)
    if x and y:
        if x <= y:
            yield x
            yield from merge(xs, ys, y=y)
        else:
            yield y
            yield from merge(xs, ys, x=x)
    else:
        if x:
            yield x
            yield from xs
        if y:
            yield y
            yield from ys

输出:

print(*msort([2,3,5,6,3,2,3,4])) # 2 2 3 3 3 4 5 6
print(*msort([4,6,5,1,7,2,3]))   # 1 2 3 4 5 6 7
print(*msort([5,6,4,1,2,3]))     # 1 2 3 4 5 6
print(*msort([5,4,3,2,1]))       # 1 2 3 4 5
print(*msort([21,32,17,23]))     # 17 21 23 32
print(*msort([17, 3,10]))        # 3 10 17
print(*msort([5, 3]))            # 3 5
print(*msort([1]))               # 1

请注意,这仍然有很大的改进空间,它只是迭代器方法的一个示例。

如果您将 even/odd 策略简化到算法中,则可以直接在 msort() 中删除 split() 函数以支持切片方法:

from itertools import islice
def mySort(A,start=None,step=None):
    if start is None: start,step = 0,1
    eStart, eStep = start     ,step*2
    oStart, oStep = start+step,step*2
    if eStart >= len(A):
        yield from islice(A,oStart,None,oStep)
    elif oStart >=len(A):
        yield from islice(A,eStart,None,eStep)
    else:
        yield from merge(mySort(A,eStart,eStep),mySort(A,oStart,oStep))
    
def merge(A,B,a=None,b=None):
    if a is None: a = next(A,None)
    if b is None: b = next(B,None)
    if   a is None: yield b; yield from B
    elif b is None: yield a; yield from A
    elif a<b:       yield a; yield from merge(A,B,b=b)
    else:           yield b; yield from merge(A,B,a=a)

请注意,在这两种情况下,递归深度均为 2*len(A),因此您将无法对大于约 490 个元素的数组进行排序。