使用 python 中的约束最大化列总和
maximize column sum with constraints in python
我有如下数据框:
from scipy import optimize
import pandas as pd
import numpy as np
df = pd.DataFrame({ 'customer':['A','A','B', 'B', 'C', 'C', 'D', 'D'],
'num_of_visits': [1,2,1,2,1,2,1,2],
'gain': [2,5, 3,4, 6,8, 5,10] })
customer num_of_visits gain
A 1 2
A 2 5
B 1 3
B 2 4
C 1 6
C 2 8
D 1 5
D 2 10
这只是一个例子。实际上,我们有很多很多客户。对于每个客户,销售代表可以进行一次访问或两次访问。可以在 gain
列中找到来自 1 或 2 次访问的销售收益。
例如拜访客户A一次,销售收益为2,拜访客户A两次,销售收益为5等。
目标是找到销售代表拜访的最佳客户集及其相应的拜访次数,以最大化总收益。
约束条件:
- 总访问次数为4次
- 每个客户一个实例(1 次或 2 次访问)
这是一个非常简化的例子,我们可以看到答案是:访问B一次,访问C一次,访问D两次。
要找到一个广义的解决方案,我们觉得这是一个优化问题。我们应该可以使用 python scipy.optimize
来找到解决方案。
我是优化新手。我们需要最大化列 gain
的总和。与用变量优化函数不同,我应该如何编写 objective 函数?我应该如何实施约束以确保每个客户一个实例?
我已经考虑了几个小时,但仍然没有头绪。
感谢任何人都可以帮助解决这个优化问题。
正如评论中已经提到的,这可以很容易地表述为整数线性规划:
min sum(gain[c, v]*b[c,v] for c in customers for v in visits[c])
s.t.
# visit each customer at most once
sum(b[c,v] for v in visits[c]) <= 1 for c in customers
# total number of visits
sum(b[c,v]*v for all v in visits, for all c in customers) == 4
这里,
gain[c,v]
是访问客户c恰好v次的收益
b[c,v]
是一个二元变量,如果客户 c 被访问 v 次则为 1,否则为 0。
customers
是客户列表
visits[c]
给你客户c 的可能访问次数
目前 scipy.optimize 不支持 ILP,但是 it's on the way :). In the meantime, you can use a modelling framework like PuLP:
import pandas as pd
import numpy as np
import pulp
df = pd.DataFrame({ 'customer':['A','A','B', 'B', 'C', 'C', 'D', 'D'],
'num_of_visits': [1,2,1,2,1,2,1,2],
'gain': [2,5, 3,4, 6,8, 5,10] })
# Sets and parameters
customers = df.customer.unique()
gain = {(row.customer, row.num_of_visits): row.gain for idx, row in df.iterrows()}
visits = {c: df[df.customer == c].num_of_visits.values for c in customers}
# pulp model
mdl = pulp.LpProblem("milp", pulp.LpMaximize)
# define the binary variables
b = {}
for c in customers:
for v in visits[c]:
b[c, v] = pulp.LpVariable(f"b[{c}, {v}]", cat="Binary")
# define the objective
mdl.setObjective(sum(gain[c, v]*b[c, v] for c in customers for v in visits[c]))
# Visit each customer at most once
for c in customers:
mdl.addConstraint(sum(b[c, v] for v in visits[c]) <= 1)
# total number of visits
mdl.addConstraint(sum(b[c, v]*v for c in customers for v in visits[c]) == 4)
# solve the problem
mdl.solve()
# output the solution
print(f"Status: {pulp.LpStatus[mdl.status]}")
for c in customers:
for v in visits[c]:
if b[c, v].varValue > 0:
print(f"Visit customer {c} exactly {v} times")
print(f"objective value: {mdl.objective.value()}")
产生
Status: Optimal
Visit customer B exactly 1 times
Visit customer C exactly 1 times
Visit customer D exactly 2 times
objective value: 19.0
我有如下数据框:
from scipy import optimize
import pandas as pd
import numpy as np
df = pd.DataFrame({ 'customer':['A','A','B', 'B', 'C', 'C', 'D', 'D'],
'num_of_visits': [1,2,1,2,1,2,1,2],
'gain': [2,5, 3,4, 6,8, 5,10] })
customer num_of_visits gain
A 1 2
A 2 5
B 1 3
B 2 4
C 1 6
C 2 8
D 1 5
D 2 10
这只是一个例子。实际上,我们有很多很多客户。对于每个客户,销售代表可以进行一次访问或两次访问。可以在 gain
列中找到来自 1 或 2 次访问的销售收益。
例如拜访客户A一次,销售收益为2,拜访客户A两次,销售收益为5等。
目标是找到销售代表拜访的最佳客户集及其相应的拜访次数,以最大化总收益。
约束条件:
- 总访问次数为4次
- 每个客户一个实例(1 次或 2 次访问)
这是一个非常简化的例子,我们可以看到答案是:访问B一次,访问C一次,访问D两次。
要找到一个广义的解决方案,我们觉得这是一个优化问题。我们应该可以使用 python scipy.optimize
来找到解决方案。
我是优化新手。我们需要最大化列 gain
的总和。与用变量优化函数不同,我应该如何编写 objective 函数?我应该如何实施约束以确保每个客户一个实例?
我已经考虑了几个小时,但仍然没有头绪。
感谢任何人都可以帮助解决这个优化问题。
正如评论中已经提到的,这可以很容易地表述为整数线性规划:
min sum(gain[c, v]*b[c,v] for c in customers for v in visits[c])
s.t.
# visit each customer at most once
sum(b[c,v] for v in visits[c]) <= 1 for c in customers
# total number of visits
sum(b[c,v]*v for all v in visits, for all c in customers) == 4
这里,
gain[c,v]
是访问客户c恰好v次的收益b[c,v]
是一个二元变量,如果客户 c 被访问 v 次则为 1,否则为 0。customers
是客户列表visits[c]
给你客户c 的可能访问次数
目前 scipy.optimize 不支持 ILP,但是 it's on the way :). In the meantime, you can use a modelling framework like PuLP:
import pandas as pd
import numpy as np
import pulp
df = pd.DataFrame({ 'customer':['A','A','B', 'B', 'C', 'C', 'D', 'D'],
'num_of_visits': [1,2,1,2,1,2,1,2],
'gain': [2,5, 3,4, 6,8, 5,10] })
# Sets and parameters
customers = df.customer.unique()
gain = {(row.customer, row.num_of_visits): row.gain for idx, row in df.iterrows()}
visits = {c: df[df.customer == c].num_of_visits.values for c in customers}
# pulp model
mdl = pulp.LpProblem("milp", pulp.LpMaximize)
# define the binary variables
b = {}
for c in customers:
for v in visits[c]:
b[c, v] = pulp.LpVariable(f"b[{c}, {v}]", cat="Binary")
# define the objective
mdl.setObjective(sum(gain[c, v]*b[c, v] for c in customers for v in visits[c]))
# Visit each customer at most once
for c in customers:
mdl.addConstraint(sum(b[c, v] for v in visits[c]) <= 1)
# total number of visits
mdl.addConstraint(sum(b[c, v]*v for c in customers for v in visits[c]) == 4)
# solve the problem
mdl.solve()
# output the solution
print(f"Status: {pulp.LpStatus[mdl.status]}")
for c in customers:
for v in visits[c]:
if b[c, v].varValue > 0:
print(f"Visit customer {c} exactly {v} times")
print(f"objective value: {mdl.objective.value()}")
产生
Status: Optimal
Visit customer B exactly 1 times
Visit customer C exactly 1 times
Visit customer D exactly 2 times
objective value: 19.0