对于一个非常大的列表,Python 中可能最接近目标总和的值的子集总和问题
Subset sum problem for a possible closest value to the target sum in Python for a really big list
我知道这个问题可能很多人问过。我的挑战是我的列表很长(比如 100+),我的目标也可能很大。递归算法需要 10 分钟才能完成。我需要任何一个子集使其最接近目标总和
我的代码
combination_dict = {}
def subset_sum(numbers, target, partial=[]):
s = sum(partial)
if s == target:
print("sum({})={}".format(partial, target))
return
if s >= target:
partial.pop()
combination_dict[sum(partial)] = partial
return
for i in range(len(numbers)):
n = numbers[i]
remaining = numbers[i + 1:]
subset_sum(remaining, target, partial + [n])
if __name__ == "__main__":
lst = [100, 100, 100, 100, 100, 100, 100] #my actual list is really long len=100(max)
tgt = 434
subset_sum(lst, tgt)
closest_sum = max([i for i in combination_dict.keys()])
print("Closest possible sum={} and the combination is ={}".format(closest_sum, combination_dict[closest_sum]))
如果我的列表长度> 20,它就会卡住。我知道这有 O(2^n) 的时间复杂度 - 但有人可以帮助我优化它以在不到 30 秒的时间内得到我的结果。
我的列表可以有浮动。目标总和也是如此。
您可以使用整数线性规划,其中每个变量都是对应于一个数字并表示该数字是否包含在结果中的二进制。它解决了两个问题,第一个是从下方逼近目标值,第二个是从上方逼近目标值,然后取最优解。以下是使用 PuLP:
的示例实现
import numpy as np
from pulp import LpMaximize, LpMinimize, LpProblem, lpSum, LpVariable
rng = np.random.default_rng(seed=0)
numbers = rng.integers(1, 10**6, size=10**4)
target = int(numbers.mean() * rng.normal(loc=1, scale=0.1))
indices = range(len(numbers))
variables = LpVariable.dicts("Choice", indices, cat="Binary")
leq_prob = LpProblem('leq', LpMaximize)
geq_prob = LpProblem('geq', LpMinimize)
leq_prob += lpSum([variables[i]*numbers[i] for i in indices]) <= target
leq_prob += lpSum([variables[i]*numbers[i] for i in indices])
geq_prob += lpSum([variables[i]*numbers[i] for i in indices]) >= target
geq_prob += lpSum([variables[i]*numbers[i] for i in indices])
leq_prob.solve()
leq_choices = [numbers[i] for i in indices if variables[i].value() == 1]
if sum(leq_choices) == target:
solution = leq_choices
else:
geq_prob.solve()
geq_choices = [numbers[i] for i in indices if variables[i].value() == 1]
solution = (
leq_choices
if target-sum(leq_choices) <= sum(geq_choices)-target
else geq_choices
)
print(f'Solution: {solution}')
print(f'Sum: {sum(solution)} (target: {target})')
我知道这个问题可能很多人问过。我的挑战是我的列表很长(比如 100+),我的目标也可能很大。递归算法需要 10 分钟才能完成。我需要任何一个子集使其最接近目标总和
我的代码
combination_dict = {}
def subset_sum(numbers, target, partial=[]):
s = sum(partial)
if s == target:
print("sum({})={}".format(partial, target))
return
if s >= target:
partial.pop()
combination_dict[sum(partial)] = partial
return
for i in range(len(numbers)):
n = numbers[i]
remaining = numbers[i + 1:]
subset_sum(remaining, target, partial + [n])
if __name__ == "__main__":
lst = [100, 100, 100, 100, 100, 100, 100] #my actual list is really long len=100(max)
tgt = 434
subset_sum(lst, tgt)
closest_sum = max([i for i in combination_dict.keys()])
print("Closest possible sum={} and the combination is ={}".format(closest_sum, combination_dict[closest_sum]))
如果我的列表长度> 20,它就会卡住。我知道这有 O(2^n) 的时间复杂度 - 但有人可以帮助我优化它以在不到 30 秒的时间内得到我的结果。 我的列表可以有浮动。目标总和也是如此。
您可以使用整数线性规划,其中每个变量都是对应于一个数字并表示该数字是否包含在结果中的二进制。它解决了两个问题,第一个是从下方逼近目标值,第二个是从上方逼近目标值,然后取最优解。以下是使用 PuLP:
的示例实现import numpy as np
from pulp import LpMaximize, LpMinimize, LpProblem, lpSum, LpVariable
rng = np.random.default_rng(seed=0)
numbers = rng.integers(1, 10**6, size=10**4)
target = int(numbers.mean() * rng.normal(loc=1, scale=0.1))
indices = range(len(numbers))
variables = LpVariable.dicts("Choice", indices, cat="Binary")
leq_prob = LpProblem('leq', LpMaximize)
geq_prob = LpProblem('geq', LpMinimize)
leq_prob += lpSum([variables[i]*numbers[i] for i in indices]) <= target
leq_prob += lpSum([variables[i]*numbers[i] for i in indices])
geq_prob += lpSum([variables[i]*numbers[i] for i in indices]) >= target
geq_prob += lpSum([variables[i]*numbers[i] for i in indices])
leq_prob.solve()
leq_choices = [numbers[i] for i in indices if variables[i].value() == 1]
if sum(leq_choices) == target:
solution = leq_choices
else:
geq_prob.solve()
geq_choices = [numbers[i] for i in indices if variables[i].value() == 1]
solution = (
leq_choices
if target-sum(leq_choices) <= sum(geq_choices)-target
else geq_choices
)
print(f'Solution: {solution}')
print(f'Sum: {sum(solution)} (target: {target})')