如何制作快速的功能归并排序和生成器
How to make a fast functional mergesort and generators
您好,我正在尝试在 python 中创建合并排序算法。我想尽可能地实现它的功能,并尽可能避免循环和赋值。现在我的功能还不够快。
有人告诉我,我必须更改我的拆分函数,因为它让我变慢了,因为它的 运行 时间为 O(n)
我现在无法想象我该如何改变它。我想我应该去掉所有支持 iterators/generators 的列表。但正如我现在所看到的,如果我应该从拆分功能中退出,我必须从根本上改变我的设计。
此外,我想知道我所有的 if else 语句是否都需要太多计算机思考?
另外,我是否可以通过柯里化改进代码?
def split(xs, half):
return (xs[:half], xs[half:])
def msort(xs):
length = len(xs)
if length <= 1:
yield from xs
else:
yield from merge(*(msort(x) for x in split(xs, length//2)))
def merge(xs, ys, x=None, y=None):
if not y:
y = get(ys)
if not x:
x = get(xs)
if x and y:
if x <= y:
yield x
yield from merge(xs, ys, y=y)
else:
yield y
yield from merge(xs, ys, x=x)
else:
if x:
yield x
yield from xs
if y:
yield y
yield from ys
def get(xs):
try:
return next(xs)
except StopIteration:
return None
你的 split 函数确实是这里的瓶颈,因为它创建了子列表,其中包含它所代表的所有内存开销。如果 msort() 旨在处理迭代器,那么您可以将 split() 更改为 return 两个迭代器(例如,一个用于奇数位置,一个用于偶数位置)。这将消除排序过程中的大部分内存分配和管理。
这是我的想法的一个例子:
from itertools import tee,islice
def split(xs,size):
i1,i2 = tee(xs) # two iterators
s1,s2 = size//2 + size%2, size//2 # corresponding sizes (evens, odds)
yield islice(i1,0,None,2),s1 # even positions (iterator,size)
yield islice(i2,1,None,2),s2 # odd positions (iterator,size)
def msort(xs,size=None): # use iterators when recursing
if size is None:size = len(xs) # first call expects a list
if size == 1: yield from xs; return # don't split single item
yield from merge(*(msort(i,s) # msort(iterator)
for i,s in split(xs,size))) # split as iterators
def merge(xs, ys, x=None, y=None):
if not y:
y = next(ys,None) # using next(..,default) instead of try/except
if not x:
x = next(xs,None)
if x and y:
if x <= y:
yield x
yield from merge(xs, ys, y=y)
else:
yield y
yield from merge(xs, ys, x=x)
else:
if x:
yield x
yield from xs
if y:
yield y
yield from ys
输出:
print(*msort([2,3,5,6,3,2,3,4])) # 2 2 3 3 3 4 5 6
print(*msort([4,6,5,1,7,2,3])) # 1 2 3 4 5 6 7
print(*msort([5,6,4,1,2,3])) # 1 2 3 4 5 6
print(*msort([5,4,3,2,1])) # 1 2 3 4 5
print(*msort([21,32,17,23])) # 17 21 23 32
print(*msort([17, 3,10])) # 3 10 17
print(*msort([5, 3])) # 3 5
print(*msort([1])) # 1
请注意,这仍然有很大的改进空间,它只是迭代器方法的一个示例。
如果您将 even/odd 策略简化到算法中,则可以直接在 msort() 中删除 split() 函数以支持切片方法:
from itertools import islice
def mySort(A,start=None,step=None):
if start is None: start,step = 0,1
eStart, eStep = start ,step*2
oStart, oStep = start+step,step*2
if eStart >= len(A):
yield from islice(A,oStart,None,oStep)
elif oStart >=len(A):
yield from islice(A,eStart,None,eStep)
else:
yield from merge(mySort(A,eStart,eStep),mySort(A,oStart,oStep))
def merge(A,B,a=None,b=None):
if a is None: a = next(A,None)
if b is None: b = next(B,None)
if a is None: yield b; yield from B
elif b is None: yield a; yield from A
elif a<b: yield a; yield from merge(A,B,b=b)
else: yield b; yield from merge(A,B,a=a)
请注意,在这两种情况下,递归深度均为 2*len(A),因此您将无法对大于约 490 个元素的数组进行排序。
您好,我正在尝试在 python 中创建合并排序算法。我想尽可能地实现它的功能,并尽可能避免循环和赋值。现在我的功能还不够快。
有人告诉我,我必须更改我的拆分函数,因为它让我变慢了,因为它的 运行 时间为 O(n)
我现在无法想象我该如何改变它。我想我应该去掉所有支持 iterators/generators 的列表。但正如我现在所看到的,如果我应该从拆分功能中退出,我必须从根本上改变我的设计。
此外,我想知道我所有的 if else 语句是否都需要太多计算机思考?
另外,我是否可以通过柯里化改进代码?
def split(xs, half):
return (xs[:half], xs[half:])
def msort(xs):
length = len(xs)
if length <= 1:
yield from xs
else:
yield from merge(*(msort(x) for x in split(xs, length//2)))
def merge(xs, ys, x=None, y=None):
if not y:
y = get(ys)
if not x:
x = get(xs)
if x and y:
if x <= y:
yield x
yield from merge(xs, ys, y=y)
else:
yield y
yield from merge(xs, ys, x=x)
else:
if x:
yield x
yield from xs
if y:
yield y
yield from ys
def get(xs):
try:
return next(xs)
except StopIteration:
return None
你的 split 函数确实是这里的瓶颈,因为它创建了子列表,其中包含它所代表的所有内存开销。如果 msort() 旨在处理迭代器,那么您可以将 split() 更改为 return 两个迭代器(例如,一个用于奇数位置,一个用于偶数位置)。这将消除排序过程中的大部分内存分配和管理。
这是我的想法的一个例子:
from itertools import tee,islice
def split(xs,size):
i1,i2 = tee(xs) # two iterators
s1,s2 = size//2 + size%2, size//2 # corresponding sizes (evens, odds)
yield islice(i1,0,None,2),s1 # even positions (iterator,size)
yield islice(i2,1,None,2),s2 # odd positions (iterator,size)
def msort(xs,size=None): # use iterators when recursing
if size is None:size = len(xs) # first call expects a list
if size == 1: yield from xs; return # don't split single item
yield from merge(*(msort(i,s) # msort(iterator)
for i,s in split(xs,size))) # split as iterators
def merge(xs, ys, x=None, y=None):
if not y:
y = next(ys,None) # using next(..,default) instead of try/except
if not x:
x = next(xs,None)
if x and y:
if x <= y:
yield x
yield from merge(xs, ys, y=y)
else:
yield y
yield from merge(xs, ys, x=x)
else:
if x:
yield x
yield from xs
if y:
yield y
yield from ys
输出:
print(*msort([2,3,5,6,3,2,3,4])) # 2 2 3 3 3 4 5 6
print(*msort([4,6,5,1,7,2,3])) # 1 2 3 4 5 6 7
print(*msort([5,6,4,1,2,3])) # 1 2 3 4 5 6
print(*msort([5,4,3,2,1])) # 1 2 3 4 5
print(*msort([21,32,17,23])) # 17 21 23 32
print(*msort([17, 3,10])) # 3 10 17
print(*msort([5, 3])) # 3 5
print(*msort([1])) # 1
请注意,这仍然有很大的改进空间,它只是迭代器方法的一个示例。
如果您将 even/odd 策略简化到算法中,则可以直接在 msort() 中删除 split() 函数以支持切片方法:
from itertools import islice
def mySort(A,start=None,step=None):
if start is None: start,step = 0,1
eStart, eStep = start ,step*2
oStart, oStep = start+step,step*2
if eStart >= len(A):
yield from islice(A,oStart,None,oStep)
elif oStart >=len(A):
yield from islice(A,eStart,None,eStep)
else:
yield from merge(mySort(A,eStart,eStep),mySort(A,oStart,oStep))
def merge(A,B,a=None,b=None):
if a is None: a = next(A,None)
if b is None: b = next(B,None)
if a is None: yield b; yield from B
elif b is None: yield a; yield from A
elif a<b: yield a; yield from merge(A,B,b=b)
else: yield b; yield from merge(A,B,a=a)
请注意,在这两种情况下,递归深度均为 2*len(A),因此您将无法对大于约 490 个元素的数组进行排序。