将一个集合划分为两个子集,使得和的差值最小并且return两个子集
Partition a set into two subsets so that the difference of the sum is minimum and return the two subsets
有很多与此相关的问题,例如.
然而,所有的答案都集中在寻找最小绝对和上。我正在尝试使用各种答案中列出的一些方法,但我不太关心返回总和,而更关心返回两个子集(如果有多个解决方案,则返回第一个找到的最佳子集)。
有没有办法做到这一点而不是 NP-hard?
我的用例是我有一组不同高度的元素,我需要将这些元素排列成两列(两个数组)以最小化元素堆叠在旁边时的总最大高度彼此。我有一个解决方案在大多数情况下都能给出不错的结果,但它并不总是准确的。我希望不必计算两组的每一个组合以获得理想的解决方案。
*我知道下面对从数组中删除项目和计算总和进行了优化。
// Input is always ordered by size
let inputWorking = [{
item: 'a',
size: 5
},
{
item: 'b',
size: 6
},
{
item: 'c',
size: 7
},
{
item: 'd',
size: 8
},
];
let inputNotWorking = [{
item: 'a',
size: 14
},
{
item: 'b',
size: 14
},
{
item: 'c',
size: 20
},
{
item: 'd',
size: 20
},
{
item: 'e',
size: 21
},
];
console.log('Output is correct', pushSides(inputWorking));
// Best fit would be [a,b,c], [d,e] with a max height of 48
console.log('Output is incorrect', pushSides(inputNotWorking));
function pushSides(items, ls = [], rs = []) {
if (items.length === 0) {
const leftSize = this.sumSizes(ls);
const rightSize = this.sumSizes(rs);
return {
left: ls,
right: rs,
maxHeight: leftSize > rightSize ? leftSize : rightSize
};
}
const lastItem = items[items.length - 1];
const result = this.pushToIdealSide(lastItem, ls, rs);
// Remove item we used
items = items.filter(t => t !== lastItem);
return pushSides(items, result.ls, result.rs);
}
function pushToIdealSide(nextItem, ls = [], rs = []) {
if (this.sumSizes(rs) + nextItem.size > this.sumSizes(ls) + nextItem.size) {
ls.push(nextItem);
} else {
rs.push(nextItem);
}
return {
ls,
rs
};
}
function sumSizes(itemSizeArray) {
return itemSizeArray.map(c => c.size)
.reduce((prev, curr) => prev + curr, 0);
}
这是 partition problem 的函数版本,因此它是 FNP 完全的(因此至少与 NP 完全一样难)。您还可以将问题表述为 Subset-Sum 的函数版本。幸运的是,它有一个伪多项式解并且在实践中通常可以快速求解。
考虑问题的等效形式更有用:给定整数的多重集 S
,其总和为 T
,找到 S
的子集,其总和为大多数 T/2
并尽可能接近 T/2
。该子集只是您问题中总和较小的子集,因此另一个子集是 S
.
的其余部分
给定一个算法(比如你 link 在另一个 post 中)只能找到最优和最多一半(或者,等价地,最小绝对差),通常有一种直接的方法来修改它以获得实际的子集。生成子集和列表时,还存储生成该子集和的元素的索引。然后,最后,我们可以从最终找到的总和回溯以恢复元素。
举个简单的例子,如果我们的数组是 [1, 5, 8],我们可以通过一次添加一个元素来生成所有子集和:
Sums-Dict: Hashmap from subset-sums to last added element for that sum.
Initial Sums: {0: null}
Add element 1:
Sums-Dict = {0: null, 1: 1}
Add element 5:
Sums-Dict = {0: null, 1: 1, 5: 5, 6: 5}
Add element 8:
Sums-Dict = {0: null, 1: 1, 5: 5, 6: 5, 8: 8, 9: 8, 13: 8, 14: 8}
然后回溯,我们使用类似于背包问题回溯的过程输出解:
Find a closest sum to (1+5+8)/2, for example '6'.
Backtracking:
Used-elements = [], Current Sum = 6.
Sums-Dict[6] = 5: add 5 to used-elements, subtract 5 from current sum
Used-elements = [5], Current Sum = 1.
Sums-Dict[1] = 1: add 1 to used-elements, subtract 1 from current sum
Used-elements = [5, 1], Current Sum = 0.
Sums-Dict[0] = null: stop. We have found the subset that summed to '6'.
可以对经典的动态规划解决方案进行微不足道的修改,以存储每个总和的额外信息;如果数字很小,这是您最好的选择。我已经包含了一个 Python 子集和的实现,其中包含一个基于 the Schroeppel-Shamir 论文的核心算法和符号。这是子集求和的天真 inclusion/exclusion 解决方案的中间相遇版本。它比蛮力方法更复杂,但运行时间为 O(2^(n/2) * n/4)
,需要 O(2^(n/4))
space,因此对于更大的输入来说是一个实用的解决方案。
from typing import Dict, List, Tuple
import collections
import math
import heapq
class SubsetSumSolver:
"""Solves the subset-sum and partition optimization problems.
Useful when values or goal sum are too large for dynamic programming"""
def __init__(self, nums: List[int]):
# Strip all zeros. Not necessary, but a useful optimization for speed
self.orig_zeros = nums.count(0)
self.nums = sorted(x for x in nums if x != 0)
self.n = len(self.nums)
def all_subset_sums(self, left_bound: int, right_bound: int) -> Dict[int, int]:
"""Return a subset-sum dictionary, mapping subset-sums of
nums[left_bound:right_bound] to any index of an element in that subset-sum."""
all_sums = {0: self.n + 1}
for i in range(left_bound, right_bound):
# Want old sums to remain/take priority
new_sums = {self.nums[i] + elem: i for elem in all_sums}
new_sums.update(all_sums)
all_sums = new_sums
return all_sums
def recover_sum_members(self, sum_dict: Dict[int, int], found_sum: int) -> List[int]:
"""Given a subset-sum dictionary and a sum, return a set of elements
from nums that formed that sum."""
answer = []
curr_sum = found_sum
while curr_sum != 0:
next_elem_index = sum_dict[curr_sum]
next_elem = self.nums[next_elem_index]
answer.append(next_elem)
curr_sum -= next_elem
assert len(answer) <= self.n
return answer
def min_absolute_difference(self, goal: float) -> List[int]:
"""Implement Schroeppel and Shamir alg. for subset sum
Runs in O(2^(n/2) * n/4) time, takes O(2^(n/4)) space
Returns a subset of self.nums whose sum is as close to goal as possible.
"""
if self.n < 8:
# Direct solution when n < 8; not worth splitting into 4 groups.
all_sums_dict = self.all_subset_sums(0, self.n)
best_diff_seen, best_sum = min(((abs(x - goal), x) for x in all_sums_dict),
key=lambda pair: pair[0])
return self.recover_sum_members(all_sums_dict, best_sum)
# Split nums into 4 equal-length parts (or as close as possible)
half = self.n // 2
bounds = [0, half // 2, half, half + (self.n - half) // 2, self.n]
# first_arr = nums[bounds[0]:bounds[1]]
# sec_arr = nums[bounds[1]:bounds[2]]
# third_arr = nums[bounds[2]:bounds[3]]
# fourth_arr = nums[bounds[3]:bounds[4]]
first_table_dict = self.all_subset_sums(bounds[0], bounds[1])
first_table = list(first_table_dict.keys())
sec_table_dict = self.all_subset_sums(bounds[1], bounds[2])
sec_table = sorted(sec_table_dict.keys())
third_table_dict = self.all_subset_sums(bounds[2], bounds[3])
third_table = list(third_table_dict.keys())
fourth_table_dict = self.all_subset_sums(bounds[3], bounds[4])
fourth_table = sorted(fourth_table_dict.keys(), reverse=True)
# min_heap stores pairs of problems from T1 and T2, and
# makes the pair with smallest sum in front of the heap
# Format: (sum, T1-index, T2-index) triplets
min_heap = [(x + sec_table[0], i, 0) for i, x in enumerate(first_table)]
# max_heap stores pairs of problems from T3 and T4, and
# makes the pair with largest sum in front of the heap
# Format: (-sum, T3-index, T4-index) triplets
max_heap = [(-(x + fourth_table[0]), i, 0) for i, x in enumerate(third_table)]
heapq.heapify(min_heap)
heapq.heapify(max_heap)
best_diff_seen = math.inf
best_diff_indices = []
while len(min_heap) != 0 and len(max_heap) != 0:
smallest, p1, p2 = min_heap[0]
largest, p3, p4 = max_heap[0]
largest = -largest
ans_here = smallest + largest
if abs(goal - ans_here) < best_diff_seen:
best_diff_seen = abs(goal - ans_here)
best_diff_indices = (p1, p2, p3, p4)
if ans_here <= goal: # Want sum to increase, so increase p2
heapq.heappop(min_heap)
if p2 + 1 != len(sec_table):
heapq.heappush(min_heap,
(first_table[p1] + sec_table[p2 + 1], p1, p2 + 1))
else: # Want sum to decrease. so increase p4
heapq.heappop(max_heap)
if p4 + 1 != len(fourth_table):
heapq.heappush(max_heap,
(-(third_table[p3] + fourth_table[p4 + 1]), p3, p4 + 1))
sum_ans = []
p1, p2, p3, p4 = best_diff_indices
sum_ans.extend(self.recover_sum_members(first_table_dict, first_table[p1]))
sum_ans.extend(self.recover_sum_members(sec_table_dict, sec_table[p2]))
sum_ans.extend(self.recover_sum_members(third_table_dict, third_table[p3]))
sum_ans.extend(self.recover_sum_members(fourth_table_dict, fourth_table[p4]))
return sum_ans
def solve_partition(self) -> Tuple[List[int], List[int]]:
"""Return a partition of nums into (smaller_sum_set, larger_sum_set)
Finds a partition whose sum-difference is minimum.
"""
total_sum = sum(self.nums)
frequency_counts = collections.Counter(self.nums)
first_subset = self.min_absolute_difference(goal=total_sum / 2.0)
if self.orig_zeros != 0:
first_subset.extend([0] * self.orig_zeros)
remaining_subset = frequency_counts - collections.Counter(first_subset)
remaining_subset = list(remaining_subset.elements())
if sum(first_subset) <= sum(remaining_subset):
return (first_subset, remaining_subset)
else:
return (remaining_subset, first_subset)
您可以在任何整数数组(最多 n=100 个元素)上这样调用它:
Solver = SubsetSumSolver([1, 5, 5, 6, 7, 10, 20])
print(Solver.solve_partition())
>>> ([10, 6, 5, 5, 1], [7, 20])
有很多与此相关的问题,例如
然而,所有的答案都集中在寻找最小绝对和上。我正在尝试使用各种答案中列出的一些方法,但我不太关心返回总和,而更关心返回两个子集(如果有多个解决方案,则返回第一个找到的最佳子集)。
有没有办法做到这一点而不是 NP-hard?
我的用例是我有一组不同高度的元素,我需要将这些元素排列成两列(两个数组)以最小化元素堆叠在旁边时的总最大高度彼此。我有一个解决方案在大多数情况下都能给出不错的结果,但它并不总是准确的。我希望不必计算两组的每一个组合以获得理想的解决方案。
*我知道下面对从数组中删除项目和计算总和进行了优化。
// Input is always ordered by size
let inputWorking = [{
item: 'a',
size: 5
},
{
item: 'b',
size: 6
},
{
item: 'c',
size: 7
},
{
item: 'd',
size: 8
},
];
let inputNotWorking = [{
item: 'a',
size: 14
},
{
item: 'b',
size: 14
},
{
item: 'c',
size: 20
},
{
item: 'd',
size: 20
},
{
item: 'e',
size: 21
},
];
console.log('Output is correct', pushSides(inputWorking));
// Best fit would be [a,b,c], [d,e] with a max height of 48
console.log('Output is incorrect', pushSides(inputNotWorking));
function pushSides(items, ls = [], rs = []) {
if (items.length === 0) {
const leftSize = this.sumSizes(ls);
const rightSize = this.sumSizes(rs);
return {
left: ls,
right: rs,
maxHeight: leftSize > rightSize ? leftSize : rightSize
};
}
const lastItem = items[items.length - 1];
const result = this.pushToIdealSide(lastItem, ls, rs);
// Remove item we used
items = items.filter(t => t !== lastItem);
return pushSides(items, result.ls, result.rs);
}
function pushToIdealSide(nextItem, ls = [], rs = []) {
if (this.sumSizes(rs) + nextItem.size > this.sumSizes(ls) + nextItem.size) {
ls.push(nextItem);
} else {
rs.push(nextItem);
}
return {
ls,
rs
};
}
function sumSizes(itemSizeArray) {
return itemSizeArray.map(c => c.size)
.reduce((prev, curr) => prev + curr, 0);
}
这是 partition problem 的函数版本,因此它是 FNP 完全的(因此至少与 NP 完全一样难)。您还可以将问题表述为 Subset-Sum 的函数版本。幸运的是,它有一个伪多项式解并且在实践中通常可以快速求解。
考虑问题的等效形式更有用:给定整数的多重集 S
,其总和为 T
,找到 S
的子集,其总和为大多数 T/2
并尽可能接近 T/2
。该子集只是您问题中总和较小的子集,因此另一个子集是 S
.
给定一个算法(比如你 link 在另一个 post 中)只能找到最优和最多一半(或者,等价地,最小绝对差),通常有一种直接的方法来修改它以获得实际的子集。生成子集和列表时,还存储生成该子集和的元素的索引。然后,最后,我们可以从最终找到的总和回溯以恢复元素。
举个简单的例子,如果我们的数组是 [1, 5, 8],我们可以通过一次添加一个元素来生成所有子集和:
Sums-Dict: Hashmap from subset-sums to last added element for that sum.
Initial Sums: {0: null}
Add element 1:
Sums-Dict = {0: null, 1: 1}
Add element 5:
Sums-Dict = {0: null, 1: 1, 5: 5, 6: 5}
Add element 8:
Sums-Dict = {0: null, 1: 1, 5: 5, 6: 5, 8: 8, 9: 8, 13: 8, 14: 8}
然后回溯,我们使用类似于背包问题回溯的过程输出解:
Find a closest sum to (1+5+8)/2, for example '6'.
Backtracking:
Used-elements = [], Current Sum = 6.
Sums-Dict[6] = 5: add 5 to used-elements, subtract 5 from current sum
Used-elements = [5], Current Sum = 1.
Sums-Dict[1] = 1: add 1 to used-elements, subtract 1 from current sum
Used-elements = [5, 1], Current Sum = 0.
Sums-Dict[0] = null: stop. We have found the subset that summed to '6'.
可以对经典的动态规划解决方案进行微不足道的修改,以存储每个总和的额外信息;如果数字很小,这是您最好的选择。我已经包含了一个 Python 子集和的实现,其中包含一个基于 the Schroeppel-Shamir 论文的核心算法和符号。这是子集求和的天真 inclusion/exclusion 解决方案的中间相遇版本。它比蛮力方法更复杂,但运行时间为 O(2^(n/2) * n/4)
,需要 O(2^(n/4))
space,因此对于更大的输入来说是一个实用的解决方案。
from typing import Dict, List, Tuple
import collections
import math
import heapq
class SubsetSumSolver:
"""Solves the subset-sum and partition optimization problems.
Useful when values or goal sum are too large for dynamic programming"""
def __init__(self, nums: List[int]):
# Strip all zeros. Not necessary, but a useful optimization for speed
self.orig_zeros = nums.count(0)
self.nums = sorted(x for x in nums if x != 0)
self.n = len(self.nums)
def all_subset_sums(self, left_bound: int, right_bound: int) -> Dict[int, int]:
"""Return a subset-sum dictionary, mapping subset-sums of
nums[left_bound:right_bound] to any index of an element in that subset-sum."""
all_sums = {0: self.n + 1}
for i in range(left_bound, right_bound):
# Want old sums to remain/take priority
new_sums = {self.nums[i] + elem: i for elem in all_sums}
new_sums.update(all_sums)
all_sums = new_sums
return all_sums
def recover_sum_members(self, sum_dict: Dict[int, int], found_sum: int) -> List[int]:
"""Given a subset-sum dictionary and a sum, return a set of elements
from nums that formed that sum."""
answer = []
curr_sum = found_sum
while curr_sum != 0:
next_elem_index = sum_dict[curr_sum]
next_elem = self.nums[next_elem_index]
answer.append(next_elem)
curr_sum -= next_elem
assert len(answer) <= self.n
return answer
def min_absolute_difference(self, goal: float) -> List[int]:
"""Implement Schroeppel and Shamir alg. for subset sum
Runs in O(2^(n/2) * n/4) time, takes O(2^(n/4)) space
Returns a subset of self.nums whose sum is as close to goal as possible.
"""
if self.n < 8:
# Direct solution when n < 8; not worth splitting into 4 groups.
all_sums_dict = self.all_subset_sums(0, self.n)
best_diff_seen, best_sum = min(((abs(x - goal), x) for x in all_sums_dict),
key=lambda pair: pair[0])
return self.recover_sum_members(all_sums_dict, best_sum)
# Split nums into 4 equal-length parts (or as close as possible)
half = self.n // 2
bounds = [0, half // 2, half, half + (self.n - half) // 2, self.n]
# first_arr = nums[bounds[0]:bounds[1]]
# sec_arr = nums[bounds[1]:bounds[2]]
# third_arr = nums[bounds[2]:bounds[3]]
# fourth_arr = nums[bounds[3]:bounds[4]]
first_table_dict = self.all_subset_sums(bounds[0], bounds[1])
first_table = list(first_table_dict.keys())
sec_table_dict = self.all_subset_sums(bounds[1], bounds[2])
sec_table = sorted(sec_table_dict.keys())
third_table_dict = self.all_subset_sums(bounds[2], bounds[3])
third_table = list(third_table_dict.keys())
fourth_table_dict = self.all_subset_sums(bounds[3], bounds[4])
fourth_table = sorted(fourth_table_dict.keys(), reverse=True)
# min_heap stores pairs of problems from T1 and T2, and
# makes the pair with smallest sum in front of the heap
# Format: (sum, T1-index, T2-index) triplets
min_heap = [(x + sec_table[0], i, 0) for i, x in enumerate(first_table)]
# max_heap stores pairs of problems from T3 and T4, and
# makes the pair with largest sum in front of the heap
# Format: (-sum, T3-index, T4-index) triplets
max_heap = [(-(x + fourth_table[0]), i, 0) for i, x in enumerate(third_table)]
heapq.heapify(min_heap)
heapq.heapify(max_heap)
best_diff_seen = math.inf
best_diff_indices = []
while len(min_heap) != 0 and len(max_heap) != 0:
smallest, p1, p2 = min_heap[0]
largest, p3, p4 = max_heap[0]
largest = -largest
ans_here = smallest + largest
if abs(goal - ans_here) < best_diff_seen:
best_diff_seen = abs(goal - ans_here)
best_diff_indices = (p1, p2, p3, p4)
if ans_here <= goal: # Want sum to increase, so increase p2
heapq.heappop(min_heap)
if p2 + 1 != len(sec_table):
heapq.heappush(min_heap,
(first_table[p1] + sec_table[p2 + 1], p1, p2 + 1))
else: # Want sum to decrease. so increase p4
heapq.heappop(max_heap)
if p4 + 1 != len(fourth_table):
heapq.heappush(max_heap,
(-(third_table[p3] + fourth_table[p4 + 1]), p3, p4 + 1))
sum_ans = []
p1, p2, p3, p4 = best_diff_indices
sum_ans.extend(self.recover_sum_members(first_table_dict, first_table[p1]))
sum_ans.extend(self.recover_sum_members(sec_table_dict, sec_table[p2]))
sum_ans.extend(self.recover_sum_members(third_table_dict, third_table[p3]))
sum_ans.extend(self.recover_sum_members(fourth_table_dict, fourth_table[p4]))
return sum_ans
def solve_partition(self) -> Tuple[List[int], List[int]]:
"""Return a partition of nums into (smaller_sum_set, larger_sum_set)
Finds a partition whose sum-difference is minimum.
"""
total_sum = sum(self.nums)
frequency_counts = collections.Counter(self.nums)
first_subset = self.min_absolute_difference(goal=total_sum / 2.0)
if self.orig_zeros != 0:
first_subset.extend([0] * self.orig_zeros)
remaining_subset = frequency_counts - collections.Counter(first_subset)
remaining_subset = list(remaining_subset.elements())
if sum(first_subset) <= sum(remaining_subset):
return (first_subset, remaining_subset)
else:
return (remaining_subset, first_subset)
您可以在任何整数数组(最多 n=100 个元素)上这样调用它:
Solver = SubsetSumSolver([1, 5, 5, 6, 7, 10, 20])
print(Solver.solve_partition())
>>> ([10, 6, 5, 5, 1], [7, 20])