根据多个条件查找大型列表的所有组合

Finding all combinations based on multiple conditions for a large list

我正在尝试计算 Fantasy Cycling 游戏的最佳团队。我有一个 csv 文件,其中包含 176 名自行车手、他们的球队、他们得分的数量以及将他们放入我的球队的价格。我正在尝试找到 16 名自行车手中得分最高的球队。

适用于任何团队组成的规则是:

可以在下面找到我的 csv 文件的简短摘录。

THOMAS Geraint,Team INEOS,142,13
SAGAN Peter,BORA - hansgrohe,522,11.5
GROENEWEGEN Dylan,Team Jumbo-Visma,205,11
FUGLSANG Jakob,Astana Pro Team,46,10
BERNAL Egan,Team INEOS,110,10
BARDET Romain,AG2R La Mondiale,21,9.5
QUINTANA Nairo,Movistar Team,58,9.5
YATES Adam,Mitchelton-Scott,40,9.5
VIVIANI Elia,Deceuninck - Quick Step,273,9.5
YATES Simon,Mitchelton-Scott,13,9
EWAN Caleb,Lotto Soudal,13,9

解决此问题的最简单方法是生成所有可能的团队组合列表,然后应用规则排除不符合上述规则的团队。在此之后,计算每支球队的总分并选出得分最高的球队就很简单了。理论上,生成可用团队列表可以通过下面的代码实现。

database_csv = pd.read_csv('renner_db_example.csv')
renners = database_csv.to_dict(orient='records')

budget = 100
max_same_team = 4
team_total = 16

combos = itertools.combinations(renners,team_total)
usable_combos = []

for i in combos:
    if sum(persoon["Waarde"] for persoon in i)  < budget and all(z <= max_same_team for z in [len(list(group)) for key, group in groupby([persoon["Ploeg"] for persoon in i])]) == True:   
    usable_combos.append(i)    

但是,将 176 名骑自行车者的列表的所有组合计算为 16 人的团队只是 too many calculations for my computer to handle, even though the answer to this 问题暗示了其他事情。我做了一些研究,但找不到任何方法来应用上述条件,而不必遍历每个选项。

有没有办法找到最佳团队,既可以使上述方法奏效,也可以使用其他方法?

编辑: 在文本中,可以找到完整的 CSV 文件 here

这是一个经典的operations research问题。

有大量算法可以找到最佳(或非常好,具体取决于算法)解决方案:

  • Mixed-Integer 编程
  • 元启发式
  • 约束规划
  • ...

这是一个代码,它将使用 MIP, ortools library and default solver COIN-OR 找到最佳解决方案:

from ortools.linear_solver import pywraplp
import pandas as pd


solver = pywraplp.Solver('cyclist', pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)    
cyclist_df = pd.read_csv('cyclists.csv')

# Variables

variables_name = {}
variables_team = {}

for _, row in cyclist_df.iterrows():
    variables_name[row['Naam']] = solver.IntVar(0, 1, 'x_{}'.format(row['Naam']))
    if row['Ploeg'] not in variables_team:
        variables_team[row['Ploeg']] = solver.IntVar(0, solver.infinity(), 'y_{}'.format(row['Ploeg']))

# Constraints

# Link cyclist <-> team
for team, var in variables_team.items():
    constraint = solver.Constraint(0, solver.infinity())
    constraint.SetCoefficient(var, 1)
    for cyclist in cyclist_df[cyclist_df.Ploeg == team]['Naam']:
        constraint.SetCoefficient(variables_name[cyclist], -1)

# Max 4 cyclist per team
for team, var in variables_team.items():
    constraint = solver.Constraint(0, 4)
    constraint.SetCoefficient(var, 1)

# Max cyclists
constraint_max_cyclists = solver.Constraint(16, 16)
for cyclist in variables_name.values():
    constraint_max_cyclists.SetCoefficient(cyclist, 1)

# Max cost
constraint_max_cost = solver.Constraint(0, 100)
for _, row in cyclist_df.iterrows():
    constraint_max_cost.SetCoefficient(variables_name[row['Naam']], row['Waarde'])    

# Objective 
objective = solver.Objective()
objective.SetMaximization()

for _, row in cyclist_df.iterrows():
    objective.SetCoefficient(variables_name[row['Naam']], row['Punten totaal:'])

# Solve and retrieve solution     
solver.Solve()

chosen_cyclists = [key for key, variable in variables_name.items() if variable.solution_value() > 0.5]

print(cyclist_df[cyclist_df.Naam.isin(chosen_cyclists)])

打印:

    Naam                Ploeg                       Punten totaal:  Waarde
1   SAGAN Peter         BORA - hansgrohe            522             11.5
2   GROENEWEGEN         Dylan   Team Jumbo-Visma    205             11.0
8   VIVIANI Elia        Deceuninck - Quick Step     273             9.5
11  ALAPHILIPPE Julian  Deceuninck - Quick Step     399             9.0
14  PINOT Thibaut       Groupama - FDJ              155             8.5
15  MATTHEWS Michael    Team Sunweb                 323             8.5
22  TRENTIN Matteo      Mitchelton-Scott            218             7.5
24  COLBRELLI Sonny     Bahrain Merida              238             6.5
25  VAN AVERMAET Greg   CCC Team                    192             6.5
44  STUYVEN Jasper      Trek - Segafredo            201             4.5
51  CICCONE Giulio      Trek - Segafredo            153             4.0
82  TEUNISSEN Mike      Team Jumbo-Visma            255             3.0
83  HERRADA Jesús       Cofidis, Solutions Crédits  255             3.0
104 NIZZOLO Giacomo     Dimension Data              121             2.5
123 MEURISSE Xandro     Wanty - Groupe Gobert       141             2.0
151 TRATNIK Jan Bahrain Merida                      87              1.0

这段代码是如何解决问题的?正如@KyleParsons 所说,它看起来像背包问题,可以使用整数编程进行建模。

让我们定义变量 Xi (0 <= i <= nb_cyclists)Yj (0 <= j <= nb_teams).

Xi = 1 if cyclist n°i is chosen, =0 otherwise

Yj = n where n is the number of cyclists chosen within team j

要定义这些变量之间的关系,您可以对这些约束进行建模:

# Link cyclist <-> team
For all j, Yj >= sum(Xi, for all i where Xi is part of team j)

为了 select 每个团队最多只有 4 名自行车手,您创建了这些限制条件:

# Max 4 cyclist per team
For all j, Yj <= 4

致 select 16 位骑自行车的人,这里是相关的限制条件:

# Min 16 cyclists 
sum(Xi, 1<=i<=nb_cyclists) >= 16
# Max 16 cyclists 
sum(Xi, 1<=i<=nb_cyclists) <= 16

成本限制:

# Max cost 
sum(ci * Xi, 1<=i<=n_cyclists) <= 100 
# where ci = cost of cyclist i

然后可以最大化

# Objective
max sum(pi * Xi, 1<=i<=n_cyclists)
# where pi = nb_points of cyclist i

请注意,我们使用线性 objective 和线性不等式约束对问题建模。如果 Xi 和 Yj 是连续变量,则此问题将是 polynomial(线性规划)并且可以使用以下方法解决:

因为这些变量是整数(整数规划或混合整数规划),该问题被称为 NP_complete class (cannot be solved using polynomial solutions unless you are a genious). Solvers like COIN-OR use complex Branch & Bound or Branch & Cut 有效解决它们的方法的一部分。 ortools 提供了一个很好的包装器来将 COIN 与 python 一起使用。这些工具是免费和开源的。

所有这些方法的优点是无需迭代所有可能的解决方案即可找到最佳解决方案(并大大减少组合​​)。

我为您的问题添加了另一个答案:

The CSV I posted was actually modified, my original one also contains a list for each rider with their score for each stage. This list looks like this [0, 40, 13, 0, 2, 55, 1, 17, 0, 14]. I am trying to find the team that performs the best overall. So I have a pool of 16 cyclists, from which the score of 10 cyclists counts towards the score of each day. The scores for each day are then summed to get a total score. The purpose is to get this final total score as high as possible.

如果您认为我应该编辑我的第一个 post 请告诉我,我认为这样更清楚,因为我的第一个 post 非常密集并且回答了最初的问题。

让我们引入一个新变量:

Zik = 1 if cyclist i is selected and is one of the top 10 in your team on day k

您需要将这些约束添加到 link 变量 Zik 和 Xi(如果骑车人 i 未被 select 编辑,即如果 Xi = 0,则变量 Zik 不能 = 1)

For all i, sum(Zik, 1<=k<=n_days) <= n_days * Xi

这些限制 select 每天 10 名骑车人:

For all k, sum(Zik, 1<=i<=n_cyclists) <= 10

最后,你的objective可以这样写:

Maximize sum(pik * Xi * Zik, 1<=i<=n_cyclists, 1 <= k <= n_days)
# where pik = nb_points of cyclist i at day k

这里是思考的部分。像这样写的 objective 不是线性的(注意两个变量 X 和 Z 之间的乘法)。幸运的是,有两个二进制文件,并且有一个技巧可以将此公式转换为线性形式。

让我们再次引入新的变量 Lik (Lik = Xi * Zik) 来线性化 objective.

objective 现在可以这样写并且是线性的:

Maximize sum(pik * Lik, 1<=i<=n_cyclists, 1 <= k <= n_days)
# where pik = nb_points of cyclist i at day k

我们现在需要添加这些约束以使 Lik 等于 Xi * Zik :

For all i,k : Xi + Zik - 1 <= Lik
For all i,k : Lik <= 1/2 * (Xi + Zik)

瞧瞧。这就是数学的美妙之处,你可以用线性方程来模拟很多东西。我提出的是高深的概念,你看不懂是正常的。


我在这个 file.

上模拟了每天的得分列

这是解决新问题的Python代码:

import ast
from ortools.linear_solver import pywraplp
import pandas as pd


solver = pywraplp.Solver('cyclist', pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)
cyclist_df = pd.read_csv('cyclists_day.csv')
cyclist_df['Punten_day'] = cyclist_df['Punten_day'].apply(ast.literal_eval)

# Variables
variables_name = {}
variables_team = {}
variables_name_per_day = {}
variables_linear = {}

for _, row in cyclist_df.iterrows():
    variables_name[row['Naam']] = solver.IntVar(0, 1, 'x_{}'.format(row['Naam']))
    if row['Ploeg'] not in variables_team:
        variables_team[row['Ploeg']] = solver.IntVar(0, solver.infinity(), 'y_{}'.format(row['Ploeg']))

    for k in range(10):
        variables_name_per_day[(row['Naam'], k)] = solver.IntVar(0, 1, 'z_{}_{}'.format(row['Naam'], k))
        variables_linear[(row['Naam'], k)] = solver.IntVar(0, 1, 'l_{}_{}'.format(row['Naam'], k))

# Link cyclist <-> team
for team, var in variables_team.items():
    constraint = solver.Constraint(0, solver.infinity())
    constraint.SetCoefficient(var, 1)
    for cyclist in cyclist_df[cyclist_df.Ploeg == team]['Naam']:
        constraint.SetCoefficient(variables_name[cyclist], -1)

# Max 4 cyclist per team
for team, var in variables_team.items():
    constraint = solver.Constraint(0, 4)
    constraint.SetCoefficient(var, 1)

# Max cyclists
constraint_max_cyclists = solver.Constraint(16, 16)
for cyclist in variables_name.values():
    constraint_max_cyclists.SetCoefficient(cyclist, 1)

# Max cost
constraint_max_cost = solver.Constraint(0, 100)
for _, row in cyclist_df.iterrows():
    constraint_max_cost.SetCoefficient(variables_name[row['Naam']], row['Waarde'])

# Link Zik and Xi
for name, cyclist in variables_name.items():
    constraint_link_cyclist_day = solver.Constraint(-solver.infinity(), 0)
    constraint_link_cyclist_day.SetCoefficient(cyclist, - 10)
    for k in range(10):
        constraint_link_cyclist_day.SetCoefficient(variables_name_per_day[name, k], 1)

# Min/Max 10 cyclists per day
for k in range(10):
    constraint_cyclist_per_day = solver.Constraint(10, 10)
    for name in cyclist_df.Naam:
        constraint_cyclist_per_day.SetCoefficient(variables_name_per_day[name, k], 1)

# Linearization constraints 
for name, cyclist in variables_name.items():
    for k in range(10):
        constraint_linearization1 = solver.Constraint(-solver.infinity(), 1)
        constraint_linearization2 = solver.Constraint(-solver.infinity(), 0)

        constraint_linearization1.SetCoefficient(cyclist, 1)
        constraint_linearization1.SetCoefficient(variables_name_per_day[name, k], 1)
        constraint_linearization1.SetCoefficient(variables_linear[name, k], -1)

        constraint_linearization2.SetCoefficient(cyclist, -1/2)
        constraint_linearization2.SetCoefficient(variables_name_per_day[name, k], -1/2)
        constraint_linearization2.SetCoefficient(variables_linear[name, k], 1)

# Objective 
objective = solver.Objective()
objective.SetMaximization()

for _, row in cyclist_df.iterrows():
    for k in range(10):
        objective.SetCoefficient(variables_linear[row['Naam'], k], row['Punten_day'][k])

solver.Solve()

chosen_cyclists = [key for key, variable in variables_name.items() if variable.solution_value() > 0.5]

print('\n'.join(chosen_cyclists))

for k in range(10):
    print('\nDay {} :'.format(k + 1))
    chosen_cyclists_day = [name for (name, day), variable in variables_name_per_day.items() 
                       if (day == k and variable.solution_value() > 0.5)]
    assert len(chosen_cyclists_day) == 10
    assert all(chosen_cyclists_day[i] in chosen_cyclists for i in range(10))
    print('\n'.join(chosen_cyclists_day))

结果如下:

你的团队:

SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
PINOT Thibaut
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
BENOOT Tiesj
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús
MEURISSE Xandro
GRELLIER Fabien

每天选定的​​骑车人

Day 1 :
SAGAN Peter
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús

Day 2 :
SAGAN Peter
ALAPHILIPPE Julian
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
TEUNISSEN Mike
NIZZOLO Giacomo
MEURISSE Xandro

Day 3 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
MATTHEWS Michael
TRENTIN Matteo
VAN AVERMAET Greg
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús

Day 4 :
SAGAN Peter
VIVIANI Elia
PINOT Thibaut
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
TEUNISSEN Mike
HERRADA Jesús

Day 5 :
SAGAN Peter
VIVIANI Elia
ALAPHILIPPE Julian
PINOT Thibaut
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
CICCONE Giulio
HERRADA Jesús

Day 6 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike

Day 7 :
SAGAN Peter
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
TEUNISSEN Mike
HERRADA Jesús
MEURISSE Xandro

Day 8 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
STUYVEN Jasper
TEUNISSEN Mike
HERRADA Jesús
NIZZOLO Giacomo
MEURISSE Xandro

Day 9 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
PINOT Thibaut
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
TEUNISSEN Mike
HERRADA Jesús

Day 10 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
PINOT Thibaut
COLBRELLI Sonny
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús
NIZZOLO Giacomo

让我们比较一下答案 1 和答案 2 的结果 print(solver.Objective().Value())

第一个模型获得 3738.0,第二个模型获得 3129.087388325567。该值较低,因为您 select 每个阶段只有 10 名骑车人,而不是 16 名。

现在如果保留第一个解决方案并使用新的评分方法,我们得到 3122.9477585307413

我们可以认为第一个模型已经足够好了:我们不必引入新的 variables/constraints,模型保持简单,我们得到的解决方案几乎与复杂模型一样好。有时不需要 100% 准确,使用一些近似值可以更轻松、更快速地求解模型。