使用 python 中的约束最大化列总和

maximize column sum with constraints in python

我有如下数据框:

from scipy import optimize
import pandas as pd
import numpy as np

df = pd.DataFrame({ 'customer':['A','A','B', 'B', 'C', 'C', 'D', 'D'],
                   'num_of_visits': [1,2,1,2,1,2,1,2],
                   'gain': [2,5, 3,4, 6,8, 5,10] })

customer    num_of_visits   gain
   A            1            2
   A            2            5
   B            1            3
   B            2            4
   C            1            6
   C            2            8
   D            1            5
   D            2            10

这只是一个例子。实际上,我们有很多很多客户。对于每个客户,销售代表可以进行一次访问或两次访问。可以在 gain 列中找到来自 1 或 2 次访问的销售收益。 例如拜访客户A一次,销售收益为2,拜访客户A两次,销售收益为5等。 目标是找到销售代表拜访的最佳客户集及其相应的拜访次数,以最大化总收益。 约束条件:

  1. 总访问次数为4次
  2. 每个客户一个实例(1 次或 2 次访问)

这是一个非常简化的例子,我们可以看到答案是:访问B一次,访问C一次,访问D两次。

要找到一个广义的解决方案,我们觉得这是一个优化问题。我们应该可以使用 python scipy.optimize 来找到解决方案。

我是优化新手。我们需要最大化列 gain 的总和。与用变量优化函数不同,我应该如何编写 objective 函数?我应该如何实施约束以确保每个客户一个实例? 我已经考虑了几个小时,但仍然没有头绪。 感谢任何人都可以帮助解决这个优化问题。

正如评论中已经提到的,这可以很容易地表述为整数线性规划:

min sum(gain[c, v]*b[c,v] for c in customers for v in visits[c])

s.t. 

# visit each customer at most once
sum(b[c,v] for v in visits[c]) <= 1 for c in customers

# total number of visits
sum(b[c,v]*v for all v in visits, for all c in customers) ==  4

这里,

  • gain[c,v]是访问客户c恰好v次的收益
  • b[c,v] 是一个二元变量,如果客户 c 被访问 v 次则为 1,否则为 0。
  • customers是客户列表
  • visits[c]给你客户c
  • 的可能访问次数

目前 scipy.optimize 不支持 ILP,但是 it's on the way :). In the meantime, you can use a modelling framework like PuLP:

import pandas as pd
import numpy as np
import pulp


df = pd.DataFrame({ 'customer':['A','A','B', 'B', 'C', 'C', 'D', 'D'],
                   'num_of_visits': [1,2,1,2,1,2,1,2],
                   'gain': [2,5, 3,4, 6,8, 5,10] })

# Sets and parameters
customers = df.customer.unique()
gain = {(row.customer, row.num_of_visits): row.gain for idx, row in df.iterrows()}
visits = {c: df[df.customer == c].num_of_visits.values for c in customers}

# pulp model
mdl = pulp.LpProblem("milp", pulp.LpMaximize)

# define the binary variables
b = {}
for c in customers:
    for v in visits[c]:
        b[c, v] = pulp.LpVariable(f"b[{c}, {v}]", cat="Binary")


# define the objective
mdl.setObjective(sum(gain[c, v]*b[c, v] for c in customers for v in visits[c]))

# Visit each customer at most once
for c in customers:
    mdl.addConstraint(sum(b[c, v] for v in visits[c]) <= 1)

# total number of visits
mdl.addConstraint(sum(b[c, v]*v for c in customers for v in visits[c]) == 4)

# solve the problem
mdl.solve()

# output the solution
print(f"Status: {pulp.LpStatus[mdl.status]}")
for c in customers:
    for v in visits[c]:
        if b[c, v].varValue > 0:
            print(f"Visit customer {c} exactly {v} times")
print(f"objective value: {mdl.objective.value()}")

产生

Status: Optimal
Visit customer B exactly 1 times
Visit customer C exactly 1 times
Visit customer D exactly 2 times
objective value: 19.0