将一个集合划分为两个子集,使得和的差值最小并且return两个子集

Partition a set into two subsets so that the difference of the sum is minimum and return the two subsets

有很多与此相关的问题,例如.

然而,所有的答案都集中在寻找最小绝对和上。我正在尝试使用各种答案中列出的一些方法,但我不太关心返回总和,而更关心返回两个子集(如果有多个解决方案,则返回第一个找到的最佳子集)。

有没有办法做到这一点而不是 NP-hard?

我的用例是我有一组不同高度的元素,我需要将这些元素排列成两列(两个数组)以最小化元素堆叠在旁边时的总最大高度彼此。我有一个解决方案在大多数情况下都能给出不错的结果,但它并不总是准确的。我希望不必计算两组的每一个组合以获得理想的解决方案。

*我知道下面对从数组中删除项目和计算总和进行了优化。

// Input is always ordered by size
let inputWorking = [{
    item: 'a',
    size: 5
  },
  {
    item: 'b',
    size: 6
  },
  {
    item: 'c',
    size: 7
  },
  {
    item: 'd',
    size: 8
  },
];

let inputNotWorking = [{
    item: 'a',
    size: 14
  },
  {
    item: 'b',
    size: 14
  },
  {
    item: 'c',
    size: 20
  },
  {
    item: 'd',
    size: 20
  },
  {
    item: 'e',
    size: 21
  },
];


console.log('Output is correct', pushSides(inputWorking));

// Best fit would be [a,b,c], [d,e] with a max height of 48
console.log('Output is incorrect', pushSides(inputNotWorking));


function pushSides(items, ls = [], rs = []) {
  if (items.length === 0) {
    const leftSize = this.sumSizes(ls);
    const rightSize = this.sumSizes(rs);
    return {
      left: ls,
      right: rs,
      maxHeight: leftSize > rightSize ? leftSize : rightSize
    };
  }

  const lastItem = items[items.length - 1];
  const result = this.pushToIdealSide(lastItem, ls, rs);
  // Remove item we used
  items = items.filter(t => t !== lastItem);
  return pushSides(items, result.ls, result.rs);
}

function pushToIdealSide(nextItem, ls = [], rs = []) {
  if (this.sumSizes(rs) + nextItem.size > this.sumSizes(ls) + nextItem.size) {
    ls.push(nextItem);
  } else {
    rs.push(nextItem);
  }

  return {
    ls,
    rs
  };
}

function sumSizes(itemSizeArray) {
  return itemSizeArray.map(c => c.size)
    .reduce((prev, curr) => prev + curr, 0);
}

这是 partition problem 的函数版本,因此它是 FNP 完全的(因此至少与 NP 完全一样难)。您还可以将问题表述为 Subset-Sum 的函数版本。幸运的是,它有一个伪多项式解并且在实践中通常可以快速求解。

考虑问题的等效形式更有用:给定整数的多重集 S,其总和为 T,找到 S 的子集,其总和为大多数 T/2 并尽可能接近 T/2。该子集只是您问题中总和较小的子集,因此另一个子集是 S.

的其余部分

给定一个算法(比如你 link 在另一个 post 中)只能找到最优和最多一半(或者,等价地,最小绝对差),通常有一种直接的方法来修改它以获得实际的子集。生成子集和列表时,还存储生成该子集和的元素的索引。然后,最后,我们可以从最终找到的总和回溯以恢复元素。

举个简单的例子,如果我们的数组是 [1, 5, 8],我们可以通过一次添加一个元素来生成所有子集和:

Sums-Dict: Hashmap from subset-sums to last added element for that sum.
Initial Sums: {0: null}

Add element 1:
Sums-Dict = {0: null, 1: 1}

Add element 5:
Sums-Dict = {0: null, 1: 1, 5: 5, 6: 5}

Add element 8:
Sums-Dict = {0: null, 1: 1, 5: 5, 6: 5, 8: 8, 9: 8, 13: 8, 14: 8}

然后回溯,我们使用类似于背包问题回溯的过程输出解:

Find a closest sum to (1+5+8)/2, for example '6'.

Backtracking:
Used-elements = [], Current Sum = 6.
Sums-Dict[6] = 5: add 5 to used-elements, subtract 5 from current sum

Used-elements = [5], Current Sum = 1.
Sums-Dict[1] = 1: add 1 to used-elements, subtract 1 from current sum


Used-elements = [5, 1], Current Sum = 0.
Sums-Dict[0] = null: stop. We have found the subset that summed to '6'.

可以对经典的动态规划解决方案进行微不足道的修改,以存储每个总和的额外信息;如果数字很小,这是您最好的选择。我已经包含了一个 Python 子集和的实现,其中包含一个基于 the Schroeppel-Shamir 论文的核心算法和符号。这是子集求和的天真 inclusion/exclusion 解决方案的中间相遇版本。它比蛮力方法更复杂,但运行时间为 O(2^(n/2) * n/4),需要 O(2^(n/4)) space,因此对于更大的输入来说是一个实用的解决方案。

from typing import Dict, List, Tuple
import collections
import math
import heapq


class SubsetSumSolver:
    """Solves the subset-sum and partition optimization problems.
    Useful when values or goal sum are too large for dynamic programming"""

    def __init__(self, nums: List[int]):

        # Strip all zeros. Not necessary, but a useful optimization for speed
        self.orig_zeros = nums.count(0)
        self.nums = sorted(x for x in nums if x != 0)

        self.n = len(self.nums)

    def all_subset_sums(self, left_bound: int, right_bound: int) -> Dict[int, int]:
        """Return a subset-sum dictionary, mapping subset-sums of
        nums[left_bound:right_bound] to any index of an element in that subset-sum."""

        all_sums = {0: self.n + 1}
        for i in range(left_bound, right_bound):
            # Want old sums to remain/take priority
            new_sums = {self.nums[i] + elem: i for elem in all_sums}
            new_sums.update(all_sums)
            all_sums = new_sums

        return all_sums

    def recover_sum_members(self, sum_dict: Dict[int, int], found_sum: int) -> List[int]:
        """Given a subset-sum dictionary and a sum, return a set of elements
        from nums that formed that sum."""

        answer = []
        curr_sum = found_sum
        while curr_sum != 0:
            next_elem_index = sum_dict[curr_sum]
            next_elem = self.nums[next_elem_index]
            answer.append(next_elem)
            curr_sum -= next_elem

            assert len(answer) <= self.n

        return answer

    def min_absolute_difference(self, goal: float) -> List[int]:
        """Implement Schroeppel and Shamir alg. for subset sum
        Runs in O(2^(n/2) * n/4) time, takes O(2^(n/4)) space
        Returns a subset of self.nums whose sum is as close to goal as possible.
        """

        if self.n < 8:
            # Direct solution when n < 8; not worth splitting into 4 groups.
            all_sums_dict = self.all_subset_sums(0, self.n)
            best_diff_seen, best_sum = min(((abs(x - goal), x) for x in all_sums_dict),
                                           key=lambda pair: pair[0])
            return self.recover_sum_members(all_sums_dict, best_sum)

        # Split nums into 4 equal-length parts (or as close as possible)
        half = self.n // 2
        bounds = [0, half // 2, half, half + (self.n - half) // 2, self.n]
        # first_arr = nums[bounds[0]:bounds[1]]
        # sec_arr = nums[bounds[1]:bounds[2]]
        # third_arr = nums[bounds[2]:bounds[3]]
        # fourth_arr = nums[bounds[3]:bounds[4]]

        first_table_dict = self.all_subset_sums(bounds[0], bounds[1])
        first_table = list(first_table_dict.keys())

        sec_table_dict = self.all_subset_sums(bounds[1], bounds[2])
        sec_table = sorted(sec_table_dict.keys())

        third_table_dict = self.all_subset_sums(bounds[2], bounds[3])
        third_table = list(third_table_dict.keys())

        fourth_table_dict = self.all_subset_sums(bounds[3], bounds[4])
        fourth_table = sorted(fourth_table_dict.keys(), reverse=True)

        # min_heap stores pairs of problems from T1 and T2, and
        # makes the pair with smallest sum in front of the heap
        # Format: (sum, T1-index, T2-index) triplets
        min_heap = [(x + sec_table[0], i, 0) for i, x in enumerate(first_table)]

        # max_heap stores pairs of problems from T3 and T4, and
        # makes the pair with largest sum in front of the heap
        # Format: (-sum, T3-index, T4-index) triplets
        max_heap = [(-(x + fourth_table[0]), i, 0) for i, x in enumerate(third_table)]

        heapq.heapify(min_heap)
        heapq.heapify(max_heap)

        best_diff_seen = math.inf
        best_diff_indices = []

        while len(min_heap) != 0 and len(max_heap) != 0:
            smallest, p1, p2 = min_heap[0]
            largest, p3, p4 = max_heap[0]
            largest = -largest

            ans_here = smallest + largest
            if abs(goal - ans_here) < best_diff_seen:
                best_diff_seen = abs(goal - ans_here)
                best_diff_indices = (p1, p2, p3, p4)

            if ans_here <= goal:  # Want sum to increase, so increase p2
                heapq.heappop(min_heap)
                if p2 + 1 != len(sec_table):
                    heapq.heappush(min_heap,
                                   (first_table[p1] + sec_table[p2 + 1], p1, p2 + 1))
            else:  # Want sum to decrease. so increase p4
                heapq.heappop(max_heap)
                if p4 + 1 != len(fourth_table):
                    heapq.heappush(max_heap,
                                   (-(third_table[p3] + fourth_table[p4 + 1]), p3, p4 + 1))

        sum_ans = []
        p1, p2, p3, p4 = best_diff_indices

        sum_ans.extend(self.recover_sum_members(first_table_dict, first_table[p1]))
        sum_ans.extend(self.recover_sum_members(sec_table_dict, sec_table[p2]))
        sum_ans.extend(self.recover_sum_members(third_table_dict, third_table[p3]))
        sum_ans.extend(self.recover_sum_members(fourth_table_dict, fourth_table[p4]))

        return sum_ans

    def solve_partition(self) -> Tuple[List[int], List[int]]:
        """Return a partition of nums into (smaller_sum_set, larger_sum_set)
        Finds a partition whose sum-difference is minimum.
        """
        total_sum = sum(self.nums)
        frequency_counts = collections.Counter(self.nums)
        first_subset = self.min_absolute_difference(goal=total_sum / 2.0)
        if self.orig_zeros != 0:
            first_subset.extend([0] * self.orig_zeros)

        remaining_subset = frequency_counts - collections.Counter(first_subset)
        remaining_subset = list(remaining_subset.elements())
        if sum(first_subset) <= sum(remaining_subset):
            return (first_subset, remaining_subset)
        else:
            return (remaining_subset, first_subset)

您可以在任何整数数组(最多 n=100 个元素)上这样调用它:

Solver = SubsetSumSolver([1, 5, 5, 6, 7, 10, 20])
print(Solver.solve_partition())

>>> ([10, 6, 5, 5, 1], [7, 20])