将数字列表拆分为 n 个块,使这些块具有(接近)相等的总和并保持原始顺序
Split a list of numbers into n chunks such that the chunks have (close to) equal sums and keep the original order
这不是标准的分区问题,因为我需要维护列表中元素的顺序。
例如,如果我有一个列表
[1, 6, 2, 3, 4, 1, 7, 6, 4]
我想要两个块,那么拆分应该给
[[1, 6, 2, 3, 4, 1], [7, 6, 4]]
每边总和为 17。对于三个块,结果将是
[[1, 6, 2, 3], [4, 1, 7], [6, 4]]
对于 12、12 和 10 的总和。
编辑补充说明
我目前将总和除以块数并将其用作目标,然后迭代直到接近该目标。问题是某些数据集可能会搞乱算法,例如试图将以下内容分成 3:-
[95, 15, 75, 25, 85, 5]
总和为 300,目标为 100。第一个块的总和为 95,第二个总和为 90,第三个总和为 110,5 为 'leftover'。将它附加到它应该出现的位置将得到 95、90、115,其中更多 'reasonable' 的解决方案将是 110、100、90。
结束编辑
背景:
我有一个包含不同高度的文本(歌词)的列表,我想将文本分成任意数量的列。目前我根据所有行的总高度计算目标高度,但显然这是一个一致的低估,在某些情况下会导致次优解决方案(最后一列明显更高)。
如果有两个所需的子列表,我可能会这样解决这个问题。它可能没有它应该的那样有效,但它是第一次削减。
def divide(l):
total = sum(l)
half = total / 2
l1 = []
l2 = []
for e in l:
if half - e >= 0 or half > abs(half - e):
l1.append(e)
half -= e
else:
l2.append(e)
return (l1, l2)
您可以在此处查看实际效果:
(l1, l2) = divide([1, 6, 2, 3, 4, 1, 7, 6, 4])
print(l1)
# [1, 6, 2, 3, 4, 1]
print(l2)
#[7, 6, 4]
(l1, l2) = divide([1,1,10,10])
print(l1)
# [1, 1, 10]
print(l2)
#[10]
我会把其他案例留给你作为练习。 :)
我认为一个好的方法是对输入列表进行排序。然后将最小和最大添加到一个列表中。第二小和第二大到下一个列表以此类推,直到所有元素都添加到列表中。
def divide_list(A):
A.sort()
l = 0
r = len(A) - 1
l1,l2= [],[]
i = 0
while l < r:
ends = [A[l], A[r]]
if i %2 ==0:
l1.extend(ends)
else:
l2.extend(ends)
i +=1
l +=1
r -=1
if r == l:
smaller = l1 if sum(l1) < sum(l2) else l2
smaller.append(A[r])
return l1, l2
myList = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print divide_list(myList)
myList = [1,10,10,1]
print divide_list(myList)
输出
([1, 7, 2, 6], [1, 6, 3, 4, 4])
([1, 10], [1, 10])
这有点晚了,但我想出了一个函数来做你需要的,它需要第二个参数来告诉它应该如何拆分列表
import math
my_list = [1, 6, 2, 3, 4, 1, 7, 6, 4]
def partition(my_list, split):
solution = []
total = sum(my_list)
div = total / split
div = math.ceil(div)
criteria = [div] * (total // div)
criteria.append(total - sum(criteria)) if sum(criteria) != total else criteria
temp = []
pivot = 0
for crit in criteria:
for count in range(len(my_list) + 1):
if sum(my_list[pivot:count]) == crit:
solution.append(my_list[pivot:count])
pivot = count
break
return solution
print(partition(my_list, 2)) # Outputs [[1, 6, 2, 3, 4, 1], [7, 6, 4]]
print(partition(my_list, 3)) # Outputs [[1, 6, 2, 3], [4, 1, 7], [6, 4]]
它会失败 4 个分区,因为你在问题中明确表示要维持订单
4 divisions = [9, 9, 9, 7]
而你的序列不匹配
这种方法定义分区边界,将数组分成大致相等数量的元素,然后反复搜索更好的分区,直到找不到更多为止。它不同于大多数其他已发布的解决方案,因为它希望通过尝试多个不同的分区来找到最佳解决方案。其他解决方案试图在单次遍历数组时创建良好的分区,但我想不出保证最优的单次遍历算法。
此处的代码是此算法的有效实现,但可能难以理解,因此在末尾包含一个更具可读性的版本作为附录。
def partition_list(a, k):
if k <= 1: return [a]
if k >= len(a): return [[x] for x in a]
partition_between = [(i+1)*len(a)/k for i in range(k-1)]
average_height = float(sum(a))/k
best_score = None
best_partitions = None
count = 0
while True:
starts = [0]+partition_between
ends = partition_between+[len(a)]
partitions = [a[starts[i]:ends[i]] for i in range(k)]
heights = map(sum, partitions)
abs_height_diffs = map(lambda x: abs(average_height - x), heights)
worst_partition_index = abs_height_diffs.index(max(abs_height_diffs))
worst_height_diff = average_height - heights[worst_partition_index]
if best_score is None or abs(worst_height_diff) < best_score:
best_score = abs(worst_height_diff)
best_partitions = partitions
no_improvements_count = 0
else:
no_improvements_count += 1
if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
return best_partitions
count += 1
move = -1 if worst_height_diff < 0 else 1
bound_to_move = 0 if worst_partition_index == 0\
else k-2 if worst_partition_index == k-1\
else worst_partition_index-1 if (worst_height_diff < 0) ^ (heights[worst_partition_index-1] > heights[worst_partition_index+1])\
else worst_partition_index
direction = -1 if bound_to_move < worst_partition_index else 1
partition_between[bound_to_move] += move * direction
def print_best_partition(a, k):
print 'Partitioning {0} into {1} partitions'.format(a, k)
p = partition_list(a, k)
print 'The best partitioning is {0}\n With heights {1}\n'.format(p, map(sum, p))
a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2)
print_best_partition(a, 3)
print_best_partition(a, 4)
b = [1, 10, 10, 1]
print_best_partition(b, 2)
import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)
d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)
根据您对此的处理方式,可能需要进行一些修改。例如,为了确定是否找到了最佳分区,当分区之间没有高度差时,该算法停止,它没有找到比连续 5 次以上迭代看到的最好的东西更好的东西,或者在 100 次之后总迭代次数作为一个包罗万象的停止点。您可能需要调整这些常量或使用不同的方案。如果你的身高形成了一个复杂的价值观景观,知道什么时候停止可能会遇到试图逃避局部最大值之类的经典问题。
输出
Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 1 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1, 7, 6, 4]]
With heights [34]
Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 2 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1], [7, 6, 4]]
With heights [17, 17]
Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 3 partitions
The best partitioning is [[1, 6, 2, 3], [4, 1, 7], [6, 4]]
With heights [12, 12, 10]
Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 4 partitions
The best partitioning is [[1, 6], [2, 3, 4], [1, 7], [6, 4]]
With heights [7, 9, 8, 10]
Partitioning [1, 10, 10, 1] into 2 partitions
The best partitioning is [[1, 10], [10, 1]]
With heights [11, 11]
Partitioning [7, 17, 17, 1, 8, 8, 12, 0, 10, 20, 17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9, 12, 3, 18, 9, 6, 7, 19, 20, 17, 7, 4, 3, 16, 20, 6, 7, 12, 16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16, 14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5, 13, 16, 0, 16, 7, 3, 8, 1, 20, 16, 11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18, 20, 3, 10, 9, 13, 12, 15, 6, 14, 16, 6, 12, 9, 9, 16, 14, 19, 1] into 10 partitions
The best partitioning is [[7, 17, 17, 1, 8, 8, 12, 0, 10, 20], [17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9], [12, 3, 18, 9, 6, 7, 19, 20], [17, 7, 4, 3, 16, 20, 6, 7, 12], [16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16], [14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5], [13, 16, 0, 16, 7, 3, 8, 1, 20, 16], [11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18], [20, 3, 10, 9, 13, 12, 15, 6, 14], [16, 6, 12, 9, 9, 16, 14, 19, 1]]
With heights [100, 95, 94, 92, 90, 87, 100, 93, 102, 102]
Partitioning [95, 15, 75, 25, 85, 5] into 3 partitions
The best partitioning is [[95, 15], [75, 25], [85, 5]]
With heights [110, 100, 90]
编辑
添加了新的测试用例 [95, 15, 75, 25, 85, 5],此方法可以正确处理。
附录
此版本的算法更易于阅读和理解,但由于较少利用内置 Python 功能,因此较长一些。然而,它的执行时间似乎相当,甚至更快。
#partition list a into k partitions
def partition_list(a, k):
#check degenerate conditions
if k <= 1: return [a]
if k >= len(a): return [[x] for x in a]
#create a list of indexes to partition between, using the index on the
#left of the partition to indicate where to partition
#to start, roughly partition the array into equal groups of len(a)/k (note
#that the last group may be a different size)
partition_between = []
for i in range(k-1):
partition_between.append((i+1)*len(a)/k)
#the ideal size for all partitions is the total height of the list divided
#by the number of paritions
average_height = float(sum(a))/k
best_score = None
best_partitions = None
count = 0
no_improvements_count = 0
#loop over possible partitionings
while True:
#partition the list
partitions = []
index = 0
for div in partition_between:
#create partitions based on partition_between
partitions.append(a[index:div])
index = div
#append the last partition, which runs from the last partition divider
#to the end of the list
partitions.append(a[index:])
#evaluate the partitioning
worst_height_diff = 0
worst_partition_index = -1
for p in partitions:
#compare the partition height to the ideal partition height
height_diff = average_height - sum(p)
#if it's the worst partition we've seen, update the variables that
#track that
if abs(height_diff) > abs(worst_height_diff):
worst_height_diff = height_diff
worst_partition_index = partitions.index(p)
#if the worst partition from this run is still better than anything
#we saw in previous iterations, update our best-ever variables
if best_score is None or abs(worst_height_diff) < best_score:
best_score = abs(worst_height_diff)
best_partitions = partitions
no_improvements_count = 0
else:
no_improvements_count += 1
#decide if we're done: if all our partition heights are ideal, or if
#we haven't seen improvement in >5 iterations, or we've tried 100
#different partitionings
#the criteria to exit are important for getting a good result with
#complex data, and changing them is a good way to experiment with getting
#improved results
if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
return best_partitions
count += 1
#adjust the partitioning of the worst partition to move it closer to the
#ideal size. the overall goal is to take the worst partition and adjust
#its size to try and make its height closer to the ideal. generally, if
#the worst partition is too big, we want to shrink the worst partition
#by moving one of its ends into the smaller of the two neighboring
#partitions. if the worst partition is too small, we want to grow the
#partition by expanding the partition towards the larger of the two
#neighboring partitions
if worst_partition_index == 0: #the worst partition is the first one
if worst_height_diff < 0: partition_between[0] -= 1 #partition too big, so make it smaller
else: partition_between[0] += 1 #partition too small, so make it bigger
elif worst_partition_index == len(partitions)-1: #the worst partition is the last one
if worst_height_diff < 0: partition_between[-1] += 1 #partition too small, so make it bigger
else: partition_between[-1] -= 1 #partition too big, so make it smaller
else: #the worst partition is in the middle somewhere
left_bound = worst_partition_index - 1 #the divider before the partition
right_bound = worst_partition_index #the divider after the partition
if worst_height_diff < 0: #partition too big, so make it smaller
if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]): #the partition on the left is bigger than the one on the right, so make the one on the right bigger
partition_between[right_bound] -= 1
else: #the partition on the left is smaller than the one on the right, so make the one on the left bigger
partition_between[left_bound] += 1
else: #partition too small, make it bigger
if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]): #the partition on the left is bigger than the one on the right, so make the one on the left smaller
partition_between[left_bound] -= 1
else: #the partition on the left is smaller than the one on the right, so make the one on the right smaller
partition_between[right_bound] += 1
def print_best_partition(a, k):
#simple function to partition a list and print info
print ' Partitioning {0} into {1} partitions'.format(a, k)
p = partition_list(a, k)
print ' The best partitioning is {0}\n With heights {1}\n'.format(p, map(sum, p))
#tests
a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2)
print_best_partition(a, 3)
print_best_partition(a, 4)
print_best_partition(a, 5)
b = [1, 10, 10, 1]
print_best_partition(b, 2)
import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)
d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)
这里有一些代码 returns 每个子列表的 2-ples 切片索引。
weights = [1, 6, 2, 3, 4, 1, 7, 6, 4]
def balance_partitions(weights:list, n:int=2) -> tuple:
if n < 1:
raise ValueError("Parameter 'n' must be 2+")
target = sum(weights) // n
results = []
cost = 0
start = 0
for i, w in enumerate(weights):
delta = target - cost
cost += w
if cost >= target:
if i == 0 or cost - target <= delta:
results.append( (start, i+1) )
start = i+1
elif cost - target > delta:
# Better if we didn't include this one.
results.append( (start, i) )
start = i
cost -= target
if len(results) == n-1:
results.append( (start, len(weights)) )
break
return tuple(results)
def print_parts(w, n):
result = balance_partitions(w, n)
print("Suggested partition indices: ", result)
for t in result:
start,end = t
sublist = w[start:end]
print(" - ", sublist, "(sum: {})".format(sum(sublist)))
print(weights, '=', sum(weights))
for i in range(2, len(weights)+1):
print_parts(weights, i)
输出为:
[1, 6, 2, 3, 4, 1, 7, 6, 4] = 34
Suggested partition indices: ((0, 6), (6, 9))
- [1, 6, 2, 3, 4, 1] (sum: 17)
- [7, 6, 4] (sum: 17)
Suggested partition indices: ((0, 4), (4, 7), (7, 9))
- [1, 6, 2, 3] (sum: 12)
- [4, 1, 7] (sum: 12)
- [6, 4] (sum: 10)
Suggested partition indices: ((0, 3), (3, 5), (5, 7), (7, 9))
- [1, 6, 2] (sum: 9)
- [3, 4] (sum: 7)
- [1, 7] (sum: 8)
- [6, 4] (sum: 10)
Suggested partition indices: ((0, 2), (2, 4), (4, 6), (6, 7), (7, 9))
- [1, 6] (sum: 7)
- [2, 3] (sum: 5)
- [4, 1] (sum: 5)
- [7] (sum: 7)
- [6, 4] (sum: 10)
Suggested partition indices: ((0, 2), (2, 3), (3, 5), (5, 6), (6, 7), (7, 9))
- [1, 6] (sum: 7)
- [2] (sum: 2)
- [3, 4] (sum: 7)
- [1] (sum: 1)
- [7] (sum: 7)
- [6, 4] (sum: 10)
Suggested partition indices: ((0, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 9))
- [1, 6] (sum: 7)
- [2] (sum: 2)
- [3] (sum: 3)
- [4] (sum: 4)
- [1] (sum: 1)
- [7] (sum: 7)
- [6, 4] (sum: 10)
Suggested partition indices: ((0, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9))
- [1, 6] (sum: 7)
- [2] (sum: 2)
- [3] (sum: 3)
- [4] (sum: 4)
- [1] (sum: 1)
- [7] (sum: 7)
- [6] (sum: 6)
- [4] (sum: 4)
Suggested partition indices: ((0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9))
- [1] (sum: 1)
- [6] (sum: 6)
- [2] (sum: 2)
- [3] (sum: 3)
- [4] (sum: 4)
- [1] (sum: 1)
- [7] (sum: 7)
- [6] (sum: 6)
- [4] (sum: 4)
这是我目前得到的最好的 O(n) 贪心算法。
这个想法是贪婪地将列表中的项目附加到一个块,直到当前块的总和超过该点块的平均预期总和。平均预期总和不断更新。这个解决方案并不完美,但正如我所说,它是 O(n) 并且在我的测试中效果不错。我渴望听到反馈和改进建议。
我在代码中留下了调试打印语句以提供一些文档。请随时评论它们以查看每个步骤中发生的情况。
代码
def split_list(lst, chunks):
#print(lst)
#print()
chunks_yielded = 0
total_sum = sum(lst)
avg_sum = total_sum/float(chunks)
chunk = []
chunksum = 0
sum_of_seen = 0
for i, item in enumerate(lst):
#print('start of loop! chunk: {}, index: {}, item: {}, chunksum: {}'.format(chunk, i, item, chunksum))
if chunks - chunks_yielded == 1:
#print('must yield the rest of the list! chunks_yielded: {}'.format(chunks_yielded))
yield chunk + lst[i:]
raise StopIteration
to_yield = chunks - chunks_yielded
chunks_left = len(lst) - i
if to_yield > chunks_left:
#print('must yield remaining list in single item chunks! to_yield: {}, chunks_left: {}'.format(to_yield, chunks_left))
if chunk:
yield chunk
yield from ([x] for x in lst[i:])
raise StopIteration
sum_of_seen += item
if chunksum < avg_sum:
#print('appending {} to chunk {}'.format(item, chunk))
chunk.append(item)
chunksum += item
else:
#print('yielding chunk {}'.format(chunk))
yield chunk
# update average expected sum, because the last yielded chunk was probably not perfect:
avg_sum = (total_sum - sum_of_seen)/(to_yield - 1)
chunks_yielded += 1
chunksum = item
chunk = [item]
测试代码
import random
lst = [1, 6, 2, 3, 4, 1, 7, 6, 4]
#lst = [random.choice(range(1,101)) for _ in range(100)]
chunks = 3
print('list: {}, avg sum: {}, chunks: {}\n'.format(lst, sum(lst)/float(chunks), chunks))
for chunk in split_list(lst, chunks):
print('chunk: {}, sum: {}'.format(chunk, sum(chunk)))
TESTS 与您的列表:
list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 17.0, chunks: 2
chunk: [1, 6, 2, 3, 4, 1], sum: 17
chunk: [7, 6, 4], sum: 17
---
list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 11.33, chunks: 3
chunk: [1, 6, 2, 3], sum: 12
chunk: [4, 1, 7], sum: 12
chunk: [6, 4], sum: 10
---
list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 8.5, chunks: 4
chunk: [1, 6, 2], sum: 9
chunk: [3, 4, 1], sum: 8
chunk: [7], sum: 7
chunk: [6, 4], sum: 10
---
list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 6.8, chunks: 5
chunk: [1, 6], sum: 7
chunk: [2, 3, 4], sum: 9
chunk: [1, 7], sum: 8
chunk: [6], sum: 6
chunk: [4], sum: 4
TESTS,随机列表长度为 100,元素从 1 到 100(省略随机列表的打印):
avg sum: 2776.0, chunks: 2
chunk: [25, 8, 71, 39, 5, 69, 29, 64, 31, 2, 90, 73, 72, 58, 52, 19, 64, 34, 16, 8, 16, 89, 70, 67, 63, 36, 9, 87, 38, 33, 22, 73, 66, 93, 46, 48, 65, 55, 81, 92, 69, 94, 43, 68, 98, 70, 28, 99, 92, 69, 24, 74], sum: 2806
chunk: [55, 55, 64, 93, 97, 53, 85, 100, 66, 61, 5, 98, 43, 74, 99, 56, 96, 74, 63, 6, 89, 82, 8, 25, 36, 68, 89, 84, 10, 46, 95, 41, 54, 39, 21, 24, 8, 82, 72, 51, 31, 48, 33, 77, 17, 69, 50, 54], sum: 2746
---
avg sum: 1047.6, chunks: 5
chunk: [19, 76, 96, 78, 12, 33, 94, 10, 38, 87, 44, 76, 28, 18, 26, 29, 44, 98, 44, 32, 80], sum: 1062
chunk: [48, 70, 42, 85, 87, 55, 44, 11, 50, 48, 47, 50, 1, 17, 93, 78, 25, 10, 89, 57, 85], sum: 1092
chunk: [30, 83, 99, 62, 48, 66, 65, 98, 94, 54, 14, 97, 58, 53, 3, 98], sum: 1022
chunk: [80, 34, 63, 20, 27, 36, 98, 97, 7, 6, 9, 65, 91, 93, 2, 27, 83, 35, 65, 17, 26, 41], sum: 1022
chunk: [80, 80, 42, 32, 44, 42, 94, 31, 50, 23, 34, 84, 47, 10, 54, 59, 72, 80, 6, 76], sum: 1040
---
avg sum: 474.6, chunks: 10
chunk: [4, 41, 47, 41, 32, 51, 81, 5, 3, 37, 40, 26, 10, 70], sum: 488
chunk: [54, 8, 91, 42, 35, 80, 13, 84, 14, 23, 59], sum: 503
chunk: [39, 4, 38, 40, 88, 69, 10, 19, 28, 97, 81], sum: 513
chunk: [19, 55, 21, 63, 99, 93, 39, 47, 29], sum: 465
chunk: [65, 88, 12, 94, 7, 47, 14, 55, 28, 9, 98], sum: 517
chunk: [19, 1, 98, 84, 92, 99, 11, 53], sum: 457
chunk: [85, 79, 69, 78, 44, 6, 19, 53], sum: 433
chunk: [59, 20, 64, 55, 2, 65, 44, 90, 37, 26], sum: 462
chunk: [78, 66, 32, 76, 59, 47, 82], sum: 440
chunk: [34, 56, 66, 27, 1, 100, 16, 5, 97, 33, 33], sum: 468
---
avg sum: 182.48, chunks: 25
chunk: [55, 6, 16, 42, 85], sum: 204
chunk: [30, 68, 3, 94], sum: 195
chunk: [68, 96, 23], sum: 187
chunk: [69, 19, 12, 97], sum: 197
chunk: [59, 88, 49], sum: 196
chunk: [1, 16, 13, 12, 61, 77], sum: 180
chunk: [49, 75, 44, 43], sum: 211
chunk: [34, 86, 9, 55], sum: 184
chunk: [25, 82, 12, 93], sum: 212
chunk: [32, 74, 53, 31], sum: 190
chunk: [13, 15, 26, 31, 35, 3, 14, 71], sum: 208
chunk: [81, 92], sum: 173
chunk: [94, 21, 34, 71], sum: 220
chunk: [1, 55, 70, 3, 92], sum: 221
chunk: [38, 59, 56, 57], sum: 210
chunk: [7, 20, 10, 81, 100], sum: 218
chunk: [5, 71, 19, 8, 82], sum: 185
chunk: [95, 14, 72], sum: 181
chunk: [2, 8, 4, 47, 75, 17], sum: 153
chunk: [56, 69, 42], sum: 167
chunk: [75, 45], sum: 120
chunk: [68, 60], sum: 128
chunk: [29, 25, 62, 3, 50], sum: 169
chunk: [54, 63], sum: 117
chunk: [57, 37, 42], sum: 136
如您所见,正如预期的那样,您想要生成的块越多,情况就会变得越糟。我希望我能帮上一点忙。
编辑:yield from
语法需要 Python 3.3 或更高版本,如果您使用的是旧版本,只需将语句转换为普通的 for 循环即可。
简单明了的使用numpy的方法。假设
import numpy.random as nr
import numpy as np
a = (nr.random(10000000)*1000).astype(int)
然后,假设您需要将列表分成 p
个部分,总和
def equisum_partition(arr,p):
ac = arr.cumsum()
#sum of the entire array
partsum = ac[-1]//p
#generates the cumulative sums of each part
cumpartsums = np.array(range(1,p))*partsum
#finds the indices where the cumulative sums are sandwiched
inds = np.searchsorted(ac,cumpartsums)
#split into approximately equal-sum arrays
parts = np.split(arr,inds)
return parts
重要的是,这是矢量化的:
In [3]: %timeit parts = equisum_partition(a,20)
53.5 ms ± 962 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
您可以检查拆分的质量,
partsums = np.array([part.sum() for part in parts]).std()
分割不是很好,但我怀疑它们是最优的,因为顺序没有改变。
这是@Milind R 的 numpy-approach 的略微编辑版本(顺便说一句,先生,非常感谢)。也就是说,我意识到在现实生活场景中,如果元素的值未 "uniformly" 分布在数组中,则脚本建议的分区最终可能不是最优的。为了解决这个问题,我通过重新排列 'smallest'、'largest'、'second smallest'、'second largest' 等元素来 "uniformified" 数组。下面的部分是这个使脚本大大(〜5x)慢。
import numpy.random as nr
import numpy as np
a = (nr.random(10000000)*1000).astype(int)
修改后的分区算法:
def equisum_partition(arr,p, uniformify=True):
#uniformify: rearrange to ['smallest', 'largest', 'second smallest', 'second largest', etc..]
if uniformify:
l = len(arr)
odd = l%2!=0
arr = np.sort(arr)
#add a dummy element if odd length
if odd:
arr = np.append(np.min(arr)-1, arr)
l = l+1
idx = np.arange(l)
idx = np.multiply(idx,
np.subtract(1,
np.multiply(
np.mod(idx, 2),
2))
)
arr = arr[idx]
#remove the dummy element
if odd:
arr = arr[1:]
#cumulative summation
ac = arr.cumsum()
#sum of the entire array
partsum = ac[-1]//p
#generates the cumulative sums of each part
cumpartsums = np.array(range(1,p))*partsum
#finds the indices where the cumulative sums are sandwiched
inds = np.searchsorted(ac,cumpartsums)
#split into approximately equal-sum arrays
parts = np.split(arr,inds)
return parts
在原始答案的示例中,由于示例数组的随机性,这并没有起到太大的作用。
统一化:
%%time
parts = equisum_partition(a,20)
partsums = np.array([part.sum() for part in parts])#
partsums.std()
Wall time: 624 ms
266.6111212984185
没有统一化:
%%time
parts = equisum_partition(a,20, uniformify=False)
partsums = np.array([part.sum() for part in parts])#
partsums.std()
Wall time: 105 ms
331.19071544957296
这不是标准的分区问题,因为我需要维护列表中元素的顺序。
例如,如果我有一个列表
[1, 6, 2, 3, 4, 1, 7, 6, 4]
我想要两个块,那么拆分应该给
[[1, 6, 2, 3, 4, 1], [7, 6, 4]]
每边总和为 17。对于三个块,结果将是
[[1, 6, 2, 3], [4, 1, 7], [6, 4]]
对于 12、12 和 10 的总和。
编辑补充说明
我目前将总和除以块数并将其用作目标,然后迭代直到接近该目标。问题是某些数据集可能会搞乱算法,例如试图将以下内容分成 3:-
[95, 15, 75, 25, 85, 5]
总和为 300,目标为 100。第一个块的总和为 95,第二个总和为 90,第三个总和为 110,5 为 'leftover'。将它附加到它应该出现的位置将得到 95、90、115,其中更多 'reasonable' 的解决方案将是 110、100、90。
结束编辑
背景:
我有一个包含不同高度的文本(歌词)的列表,我想将文本分成任意数量的列。目前我根据所有行的总高度计算目标高度,但显然这是一个一致的低估,在某些情况下会导致次优解决方案(最后一列明显更高)。
如果有两个所需的子列表,我可能会这样解决这个问题。它可能没有它应该的那样有效,但它是第一次削减。
def divide(l):
total = sum(l)
half = total / 2
l1 = []
l2 = []
for e in l:
if half - e >= 0 or half > abs(half - e):
l1.append(e)
half -= e
else:
l2.append(e)
return (l1, l2)
您可以在此处查看实际效果:
(l1, l2) = divide([1, 6, 2, 3, 4, 1, 7, 6, 4])
print(l1)
# [1, 6, 2, 3, 4, 1]
print(l2)
#[7, 6, 4]
(l1, l2) = divide([1,1,10,10])
print(l1)
# [1, 1, 10]
print(l2)
#[10]
我会把其他案例留给你作为练习。 :)
我认为一个好的方法是对输入列表进行排序。然后将最小和最大添加到一个列表中。第二小和第二大到下一个列表以此类推,直到所有元素都添加到列表中。
def divide_list(A):
A.sort()
l = 0
r = len(A) - 1
l1,l2= [],[]
i = 0
while l < r:
ends = [A[l], A[r]]
if i %2 ==0:
l1.extend(ends)
else:
l2.extend(ends)
i +=1
l +=1
r -=1
if r == l:
smaller = l1 if sum(l1) < sum(l2) else l2
smaller.append(A[r])
return l1, l2
myList = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print divide_list(myList)
myList = [1,10,10,1]
print divide_list(myList)
输出
([1, 7, 2, 6], [1, 6, 3, 4, 4])
([1, 10], [1, 10])
这有点晚了,但我想出了一个函数来做你需要的,它需要第二个参数来告诉它应该如何拆分列表
import math
my_list = [1, 6, 2, 3, 4, 1, 7, 6, 4]
def partition(my_list, split):
solution = []
total = sum(my_list)
div = total / split
div = math.ceil(div)
criteria = [div] * (total // div)
criteria.append(total - sum(criteria)) if sum(criteria) != total else criteria
temp = []
pivot = 0
for crit in criteria:
for count in range(len(my_list) + 1):
if sum(my_list[pivot:count]) == crit:
solution.append(my_list[pivot:count])
pivot = count
break
return solution
print(partition(my_list, 2)) # Outputs [[1, 6, 2, 3, 4, 1], [7, 6, 4]]
print(partition(my_list, 3)) # Outputs [[1, 6, 2, 3], [4, 1, 7], [6, 4]]
它会失败 4 个分区,因为你在问题中明确表示要维持订单
4 divisions = [9, 9, 9, 7]
而你的序列不匹配
这种方法定义分区边界,将数组分成大致相等数量的元素,然后反复搜索更好的分区,直到找不到更多为止。它不同于大多数其他已发布的解决方案,因为它希望通过尝试多个不同的分区来找到最佳解决方案。其他解决方案试图在单次遍历数组时创建良好的分区,但我想不出保证最优的单次遍历算法。
此处的代码是此算法的有效实现,但可能难以理解,因此在末尾包含一个更具可读性的版本作为附录。
def partition_list(a, k):
if k <= 1: return [a]
if k >= len(a): return [[x] for x in a]
partition_between = [(i+1)*len(a)/k for i in range(k-1)]
average_height = float(sum(a))/k
best_score = None
best_partitions = None
count = 0
while True:
starts = [0]+partition_between
ends = partition_between+[len(a)]
partitions = [a[starts[i]:ends[i]] for i in range(k)]
heights = map(sum, partitions)
abs_height_diffs = map(lambda x: abs(average_height - x), heights)
worst_partition_index = abs_height_diffs.index(max(abs_height_diffs))
worst_height_diff = average_height - heights[worst_partition_index]
if best_score is None or abs(worst_height_diff) < best_score:
best_score = abs(worst_height_diff)
best_partitions = partitions
no_improvements_count = 0
else:
no_improvements_count += 1
if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
return best_partitions
count += 1
move = -1 if worst_height_diff < 0 else 1
bound_to_move = 0 if worst_partition_index == 0\
else k-2 if worst_partition_index == k-1\
else worst_partition_index-1 if (worst_height_diff < 0) ^ (heights[worst_partition_index-1] > heights[worst_partition_index+1])\
else worst_partition_index
direction = -1 if bound_to_move < worst_partition_index else 1
partition_between[bound_to_move] += move * direction
def print_best_partition(a, k):
print 'Partitioning {0} into {1} partitions'.format(a, k)
p = partition_list(a, k)
print 'The best partitioning is {0}\n With heights {1}\n'.format(p, map(sum, p))
a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2)
print_best_partition(a, 3)
print_best_partition(a, 4)
b = [1, 10, 10, 1]
print_best_partition(b, 2)
import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)
d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)
根据您对此的处理方式,可能需要进行一些修改。例如,为了确定是否找到了最佳分区,当分区之间没有高度差时,该算法停止,它没有找到比连续 5 次以上迭代看到的最好的东西更好的东西,或者在 100 次之后总迭代次数作为一个包罗万象的停止点。您可能需要调整这些常量或使用不同的方案。如果你的身高形成了一个复杂的价值观景观,知道什么时候停止可能会遇到试图逃避局部最大值之类的经典问题。
输出
Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 1 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1, 7, 6, 4]]
With heights [34]
Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 2 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1], [7, 6, 4]]
With heights [17, 17]
Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 3 partitions
The best partitioning is [[1, 6, 2, 3], [4, 1, 7], [6, 4]]
With heights [12, 12, 10]
Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 4 partitions
The best partitioning is [[1, 6], [2, 3, 4], [1, 7], [6, 4]]
With heights [7, 9, 8, 10]
Partitioning [1, 10, 10, 1] into 2 partitions
The best partitioning is [[1, 10], [10, 1]]
With heights [11, 11]
Partitioning [7, 17, 17, 1, 8, 8, 12, 0, 10, 20, 17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9, 12, 3, 18, 9, 6, 7, 19, 20, 17, 7, 4, 3, 16, 20, 6, 7, 12, 16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16, 14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5, 13, 16, 0, 16, 7, 3, 8, 1, 20, 16, 11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18, 20, 3, 10, 9, 13, 12, 15, 6, 14, 16, 6, 12, 9, 9, 16, 14, 19, 1] into 10 partitions
The best partitioning is [[7, 17, 17, 1, 8, 8, 12, 0, 10, 20], [17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9], [12, 3, 18, 9, 6, 7, 19, 20], [17, 7, 4, 3, 16, 20, 6, 7, 12], [16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16], [14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5], [13, 16, 0, 16, 7, 3, 8, 1, 20, 16], [11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18], [20, 3, 10, 9, 13, 12, 15, 6, 14], [16, 6, 12, 9, 9, 16, 14, 19, 1]]
With heights [100, 95, 94, 92, 90, 87, 100, 93, 102, 102]
Partitioning [95, 15, 75, 25, 85, 5] into 3 partitions
The best partitioning is [[95, 15], [75, 25], [85, 5]]
With heights [110, 100, 90]
编辑
添加了新的测试用例 [95, 15, 75, 25, 85, 5],此方法可以正确处理。
附录
此版本的算法更易于阅读和理解,但由于较少利用内置 Python 功能,因此较长一些。然而,它的执行时间似乎相当,甚至更快。
#partition list a into k partitions
def partition_list(a, k):
#check degenerate conditions
if k <= 1: return [a]
if k >= len(a): return [[x] for x in a]
#create a list of indexes to partition between, using the index on the
#left of the partition to indicate where to partition
#to start, roughly partition the array into equal groups of len(a)/k (note
#that the last group may be a different size)
partition_between = []
for i in range(k-1):
partition_between.append((i+1)*len(a)/k)
#the ideal size for all partitions is the total height of the list divided
#by the number of paritions
average_height = float(sum(a))/k
best_score = None
best_partitions = None
count = 0
no_improvements_count = 0
#loop over possible partitionings
while True:
#partition the list
partitions = []
index = 0
for div in partition_between:
#create partitions based on partition_between
partitions.append(a[index:div])
index = div
#append the last partition, which runs from the last partition divider
#to the end of the list
partitions.append(a[index:])
#evaluate the partitioning
worst_height_diff = 0
worst_partition_index = -1
for p in partitions:
#compare the partition height to the ideal partition height
height_diff = average_height - sum(p)
#if it's the worst partition we've seen, update the variables that
#track that
if abs(height_diff) > abs(worst_height_diff):
worst_height_diff = height_diff
worst_partition_index = partitions.index(p)
#if the worst partition from this run is still better than anything
#we saw in previous iterations, update our best-ever variables
if best_score is None or abs(worst_height_diff) < best_score:
best_score = abs(worst_height_diff)
best_partitions = partitions
no_improvements_count = 0
else:
no_improvements_count += 1
#decide if we're done: if all our partition heights are ideal, or if
#we haven't seen improvement in >5 iterations, or we've tried 100
#different partitionings
#the criteria to exit are important for getting a good result with
#complex data, and changing them is a good way to experiment with getting
#improved results
if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
return best_partitions
count += 1
#adjust the partitioning of the worst partition to move it closer to the
#ideal size. the overall goal is to take the worst partition and adjust
#its size to try and make its height closer to the ideal. generally, if
#the worst partition is too big, we want to shrink the worst partition
#by moving one of its ends into the smaller of the two neighboring
#partitions. if the worst partition is too small, we want to grow the
#partition by expanding the partition towards the larger of the two
#neighboring partitions
if worst_partition_index == 0: #the worst partition is the first one
if worst_height_diff < 0: partition_between[0] -= 1 #partition too big, so make it smaller
else: partition_between[0] += 1 #partition too small, so make it bigger
elif worst_partition_index == len(partitions)-1: #the worst partition is the last one
if worst_height_diff < 0: partition_between[-1] += 1 #partition too small, so make it bigger
else: partition_between[-1] -= 1 #partition too big, so make it smaller
else: #the worst partition is in the middle somewhere
left_bound = worst_partition_index - 1 #the divider before the partition
right_bound = worst_partition_index #the divider after the partition
if worst_height_diff < 0: #partition too big, so make it smaller
if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]): #the partition on the left is bigger than the one on the right, so make the one on the right bigger
partition_between[right_bound] -= 1
else: #the partition on the left is smaller than the one on the right, so make the one on the left bigger
partition_between[left_bound] += 1
else: #partition too small, make it bigger
if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]): #the partition on the left is bigger than the one on the right, so make the one on the left smaller
partition_between[left_bound] -= 1
else: #the partition on the left is smaller than the one on the right, so make the one on the right smaller
partition_between[right_bound] += 1
def print_best_partition(a, k):
#simple function to partition a list and print info
print ' Partitioning {0} into {1} partitions'.format(a, k)
p = partition_list(a, k)
print ' The best partitioning is {0}\n With heights {1}\n'.format(p, map(sum, p))
#tests
a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2)
print_best_partition(a, 3)
print_best_partition(a, 4)
print_best_partition(a, 5)
b = [1, 10, 10, 1]
print_best_partition(b, 2)
import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)
d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)
这里有一些代码 returns 每个子列表的 2-ples 切片索引。
weights = [1, 6, 2, 3, 4, 1, 7, 6, 4]
def balance_partitions(weights:list, n:int=2) -> tuple:
if n < 1:
raise ValueError("Parameter 'n' must be 2+")
target = sum(weights) // n
results = []
cost = 0
start = 0
for i, w in enumerate(weights):
delta = target - cost
cost += w
if cost >= target:
if i == 0 or cost - target <= delta:
results.append( (start, i+1) )
start = i+1
elif cost - target > delta:
# Better if we didn't include this one.
results.append( (start, i) )
start = i
cost -= target
if len(results) == n-1:
results.append( (start, len(weights)) )
break
return tuple(results)
def print_parts(w, n):
result = balance_partitions(w, n)
print("Suggested partition indices: ", result)
for t in result:
start,end = t
sublist = w[start:end]
print(" - ", sublist, "(sum: {})".format(sum(sublist)))
print(weights, '=', sum(weights))
for i in range(2, len(weights)+1):
print_parts(weights, i)
输出为:
[1, 6, 2, 3, 4, 1, 7, 6, 4] = 34
Suggested partition indices: ((0, 6), (6, 9))
- [1, 6, 2, 3, 4, 1] (sum: 17)
- [7, 6, 4] (sum: 17)
Suggested partition indices: ((0, 4), (4, 7), (7, 9))
- [1, 6, 2, 3] (sum: 12)
- [4, 1, 7] (sum: 12)
- [6, 4] (sum: 10)
Suggested partition indices: ((0, 3), (3, 5), (5, 7), (7, 9))
- [1, 6, 2] (sum: 9)
- [3, 4] (sum: 7)
- [1, 7] (sum: 8)
- [6, 4] (sum: 10)
Suggested partition indices: ((0, 2), (2, 4), (4, 6), (6, 7), (7, 9))
- [1, 6] (sum: 7)
- [2, 3] (sum: 5)
- [4, 1] (sum: 5)
- [7] (sum: 7)
- [6, 4] (sum: 10)
Suggested partition indices: ((0, 2), (2, 3), (3, 5), (5, 6), (6, 7), (7, 9))
- [1, 6] (sum: 7)
- [2] (sum: 2)
- [3, 4] (sum: 7)
- [1] (sum: 1)
- [7] (sum: 7)
- [6, 4] (sum: 10)
Suggested partition indices: ((0, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 9))
- [1, 6] (sum: 7)
- [2] (sum: 2)
- [3] (sum: 3)
- [4] (sum: 4)
- [1] (sum: 1)
- [7] (sum: 7)
- [6, 4] (sum: 10)
Suggested partition indices: ((0, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9))
- [1, 6] (sum: 7)
- [2] (sum: 2)
- [3] (sum: 3)
- [4] (sum: 4)
- [1] (sum: 1)
- [7] (sum: 7)
- [6] (sum: 6)
- [4] (sum: 4)
Suggested partition indices: ((0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9))
- [1] (sum: 1)
- [6] (sum: 6)
- [2] (sum: 2)
- [3] (sum: 3)
- [4] (sum: 4)
- [1] (sum: 1)
- [7] (sum: 7)
- [6] (sum: 6)
- [4] (sum: 4)
这是我目前得到的最好的 O(n) 贪心算法。 这个想法是贪婪地将列表中的项目附加到一个块,直到当前块的总和超过该点块的平均预期总和。平均预期总和不断更新。这个解决方案并不完美,但正如我所说,它是 O(n) 并且在我的测试中效果不错。我渴望听到反馈和改进建议。
我在代码中留下了调试打印语句以提供一些文档。请随时评论它们以查看每个步骤中发生的情况。
代码
def split_list(lst, chunks):
#print(lst)
#print()
chunks_yielded = 0
total_sum = sum(lst)
avg_sum = total_sum/float(chunks)
chunk = []
chunksum = 0
sum_of_seen = 0
for i, item in enumerate(lst):
#print('start of loop! chunk: {}, index: {}, item: {}, chunksum: {}'.format(chunk, i, item, chunksum))
if chunks - chunks_yielded == 1:
#print('must yield the rest of the list! chunks_yielded: {}'.format(chunks_yielded))
yield chunk + lst[i:]
raise StopIteration
to_yield = chunks - chunks_yielded
chunks_left = len(lst) - i
if to_yield > chunks_left:
#print('must yield remaining list in single item chunks! to_yield: {}, chunks_left: {}'.format(to_yield, chunks_left))
if chunk:
yield chunk
yield from ([x] for x in lst[i:])
raise StopIteration
sum_of_seen += item
if chunksum < avg_sum:
#print('appending {} to chunk {}'.format(item, chunk))
chunk.append(item)
chunksum += item
else:
#print('yielding chunk {}'.format(chunk))
yield chunk
# update average expected sum, because the last yielded chunk was probably not perfect:
avg_sum = (total_sum - sum_of_seen)/(to_yield - 1)
chunks_yielded += 1
chunksum = item
chunk = [item]
测试代码
import random
lst = [1, 6, 2, 3, 4, 1, 7, 6, 4]
#lst = [random.choice(range(1,101)) for _ in range(100)]
chunks = 3
print('list: {}, avg sum: {}, chunks: {}\n'.format(lst, sum(lst)/float(chunks), chunks))
for chunk in split_list(lst, chunks):
print('chunk: {}, sum: {}'.format(chunk, sum(chunk)))
TESTS 与您的列表:
list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 17.0, chunks: 2
chunk: [1, 6, 2, 3, 4, 1], sum: 17
chunk: [7, 6, 4], sum: 17
---
list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 11.33, chunks: 3
chunk: [1, 6, 2, 3], sum: 12
chunk: [4, 1, 7], sum: 12
chunk: [6, 4], sum: 10
---
list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 8.5, chunks: 4
chunk: [1, 6, 2], sum: 9
chunk: [3, 4, 1], sum: 8
chunk: [7], sum: 7
chunk: [6, 4], sum: 10
---
list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 6.8, chunks: 5
chunk: [1, 6], sum: 7
chunk: [2, 3, 4], sum: 9
chunk: [1, 7], sum: 8
chunk: [6], sum: 6
chunk: [4], sum: 4
TESTS,随机列表长度为 100,元素从 1 到 100(省略随机列表的打印):
avg sum: 2776.0, chunks: 2
chunk: [25, 8, 71, 39, 5, 69, 29, 64, 31, 2, 90, 73, 72, 58, 52, 19, 64, 34, 16, 8, 16, 89, 70, 67, 63, 36, 9, 87, 38, 33, 22, 73, 66, 93, 46, 48, 65, 55, 81, 92, 69, 94, 43, 68, 98, 70, 28, 99, 92, 69, 24, 74], sum: 2806
chunk: [55, 55, 64, 93, 97, 53, 85, 100, 66, 61, 5, 98, 43, 74, 99, 56, 96, 74, 63, 6, 89, 82, 8, 25, 36, 68, 89, 84, 10, 46, 95, 41, 54, 39, 21, 24, 8, 82, 72, 51, 31, 48, 33, 77, 17, 69, 50, 54], sum: 2746
---
avg sum: 1047.6, chunks: 5
chunk: [19, 76, 96, 78, 12, 33, 94, 10, 38, 87, 44, 76, 28, 18, 26, 29, 44, 98, 44, 32, 80], sum: 1062
chunk: [48, 70, 42, 85, 87, 55, 44, 11, 50, 48, 47, 50, 1, 17, 93, 78, 25, 10, 89, 57, 85], sum: 1092
chunk: [30, 83, 99, 62, 48, 66, 65, 98, 94, 54, 14, 97, 58, 53, 3, 98], sum: 1022
chunk: [80, 34, 63, 20, 27, 36, 98, 97, 7, 6, 9, 65, 91, 93, 2, 27, 83, 35, 65, 17, 26, 41], sum: 1022
chunk: [80, 80, 42, 32, 44, 42, 94, 31, 50, 23, 34, 84, 47, 10, 54, 59, 72, 80, 6, 76], sum: 1040
---
avg sum: 474.6, chunks: 10
chunk: [4, 41, 47, 41, 32, 51, 81, 5, 3, 37, 40, 26, 10, 70], sum: 488
chunk: [54, 8, 91, 42, 35, 80, 13, 84, 14, 23, 59], sum: 503
chunk: [39, 4, 38, 40, 88, 69, 10, 19, 28, 97, 81], sum: 513
chunk: [19, 55, 21, 63, 99, 93, 39, 47, 29], sum: 465
chunk: [65, 88, 12, 94, 7, 47, 14, 55, 28, 9, 98], sum: 517
chunk: [19, 1, 98, 84, 92, 99, 11, 53], sum: 457
chunk: [85, 79, 69, 78, 44, 6, 19, 53], sum: 433
chunk: [59, 20, 64, 55, 2, 65, 44, 90, 37, 26], sum: 462
chunk: [78, 66, 32, 76, 59, 47, 82], sum: 440
chunk: [34, 56, 66, 27, 1, 100, 16, 5, 97, 33, 33], sum: 468
---
avg sum: 182.48, chunks: 25
chunk: [55, 6, 16, 42, 85], sum: 204
chunk: [30, 68, 3, 94], sum: 195
chunk: [68, 96, 23], sum: 187
chunk: [69, 19, 12, 97], sum: 197
chunk: [59, 88, 49], sum: 196
chunk: [1, 16, 13, 12, 61, 77], sum: 180
chunk: [49, 75, 44, 43], sum: 211
chunk: [34, 86, 9, 55], sum: 184
chunk: [25, 82, 12, 93], sum: 212
chunk: [32, 74, 53, 31], sum: 190
chunk: [13, 15, 26, 31, 35, 3, 14, 71], sum: 208
chunk: [81, 92], sum: 173
chunk: [94, 21, 34, 71], sum: 220
chunk: [1, 55, 70, 3, 92], sum: 221
chunk: [38, 59, 56, 57], sum: 210
chunk: [7, 20, 10, 81, 100], sum: 218
chunk: [5, 71, 19, 8, 82], sum: 185
chunk: [95, 14, 72], sum: 181
chunk: [2, 8, 4, 47, 75, 17], sum: 153
chunk: [56, 69, 42], sum: 167
chunk: [75, 45], sum: 120
chunk: [68, 60], sum: 128
chunk: [29, 25, 62, 3, 50], sum: 169
chunk: [54, 63], sum: 117
chunk: [57, 37, 42], sum: 136
如您所见,正如预期的那样,您想要生成的块越多,情况就会变得越糟。我希望我能帮上一点忙。
编辑:yield from
语法需要 Python 3.3 或更高版本,如果您使用的是旧版本,只需将语句转换为普通的 for 循环即可。
简单明了的使用numpy的方法。假设
import numpy.random as nr
import numpy as np
a = (nr.random(10000000)*1000).astype(int)
然后,假设您需要将列表分成 p
个部分,总和
def equisum_partition(arr,p):
ac = arr.cumsum()
#sum of the entire array
partsum = ac[-1]//p
#generates the cumulative sums of each part
cumpartsums = np.array(range(1,p))*partsum
#finds the indices where the cumulative sums are sandwiched
inds = np.searchsorted(ac,cumpartsums)
#split into approximately equal-sum arrays
parts = np.split(arr,inds)
return parts
重要的是,这是矢量化的:
In [3]: %timeit parts = equisum_partition(a,20)
53.5 ms ± 962 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
您可以检查拆分的质量,
partsums = np.array([part.sum() for part in parts]).std()
分割不是很好,但我怀疑它们是最优的,因为顺序没有改变。
这是@Milind R 的 numpy-approach 的略微编辑版本(顺便说一句,先生,非常感谢)。也就是说,我意识到在现实生活场景中,如果元素的值未 "uniformly" 分布在数组中,则脚本建议的分区最终可能不是最优的。为了解决这个问题,我通过重新排列 'smallest'、'largest'、'second smallest'、'second largest' 等元素来 "uniformified" 数组。下面的部分是这个使脚本大大(〜5x)慢。
import numpy.random as nr
import numpy as np
a = (nr.random(10000000)*1000).astype(int)
修改后的分区算法:
def equisum_partition(arr,p, uniformify=True):
#uniformify: rearrange to ['smallest', 'largest', 'second smallest', 'second largest', etc..]
if uniformify:
l = len(arr)
odd = l%2!=0
arr = np.sort(arr)
#add a dummy element if odd length
if odd:
arr = np.append(np.min(arr)-1, arr)
l = l+1
idx = np.arange(l)
idx = np.multiply(idx,
np.subtract(1,
np.multiply(
np.mod(idx, 2),
2))
)
arr = arr[idx]
#remove the dummy element
if odd:
arr = arr[1:]
#cumulative summation
ac = arr.cumsum()
#sum of the entire array
partsum = ac[-1]//p
#generates the cumulative sums of each part
cumpartsums = np.array(range(1,p))*partsum
#finds the indices where the cumulative sums are sandwiched
inds = np.searchsorted(ac,cumpartsums)
#split into approximately equal-sum arrays
parts = np.split(arr,inds)
return parts
在原始答案的示例中,由于示例数组的随机性,这并没有起到太大的作用。
统一化:
%%time
parts = equisum_partition(a,20)
partsums = np.array([part.sum() for part in parts])#
partsums.std()
Wall time: 624 ms
266.6111212984185
没有统一化:
%%time
parts = equisum_partition(a,20, uniformify=False)
partsums = np.array([part.sum() for part in parts])#
partsums.std()
Wall time: 105 ms
331.19071544957296