最小化重新分配个人的成本

Question

我有属于不同类别的个人，他们位于不同的位置区域，这些人口预计将从以下 population 值增长 demand 值。

population_and_demand_by_category_and_zone <- tibble::tribble(
  ~category, ~zone, ~population, ~demand,
        "A",     1,         115,     138,
        "A",     2,         121,     145,
        "A",     3,         112,     134,
        "A",     4,          76,      91,
        "B",     1,          70,      99,
        "B",     2,          59,      83,
        "B",     3,          86,     121,
        "B",     4,         139,     196,
        "C",     1,         142,     160,
        "C",     2,          72,      81,
        "C",     3,          29,      33,
        "C",     4,          58,      66,
        "D",     1,          22,      47,
        "D",     2,          23,      49,
        "D",     3,          16,      34,
        "D",     4,          45,      96
)

区域具有给定容量，当前人口低于此阈值，但需求某些区域将超出容量。

demand_and_capacity_by_zone <- tibble::tribble(
  ~zone, ~demand, ~capacity, ~capacity_exceeded,
      1,     444,       465,              FALSE,
      2,     358,       393,              FALSE,
      3,     322,       500,              FALSE,
      4,     449,       331,               TRUE
)

所以我们需要将这些人转移到一个新区域（我们假设我们有足够的总容量）。我们需要移动的每个人都会产生成本，这取决于它类别和目标区域。这些费用如下。

costs <- tibble::tribble(
  ~category, ~zone, ~cost,
        "A",     1,   0.1,
        "A",     2,   0.1,
        "A",     3,   0.1,
        "A",     4,   1.3,
        "B",     1,  16.2,
        "B",     2,  38.1,
        "B",     3,   1.5,
        "B",     4,   0.1,
        "C",     1,   0.1,
        "C",     2,  12.7,
        "C",     3,  97.7,
        "C",     4,  46.3,
        "D",     1,  25.3,
        "D",     2,   7.7,
        "D",     3,  67.3,
        "D",     4,   0.1
)

我希望找到跨区域和类别的个人分布，以便总成本最小化。所以基本上有一个新专栏 new_population 在上述 table population_and_demand_by_category_and_zone 中。

如果有几种可能的解决方案，如果结果不是整数，任何一种都可以人口，没问题。

实际用例大约有 20 个类别和 30 个区域，所以更大但不是那么大。

这似乎是一个很常见的问题，所以我希望有一种方便的方法可以在 R 中解决这个问题。

Answer 1

这可以建模为小型 LP（线性规划）模型。我们引入非负变量 move(c,z,z') 来指示要从区域 z 移动到区域 z' 的类别 c 的人数。数学模型可能如下所示：

这可以使用任何 LP 求解器来实现。解决方案可能如下所示：

----     83 VARIABLE move.L  moves needed to meet capacity

                 zone1       zone2       zone3

catA.zone1                       6
catA.zone4                      29          62
catC.zone4          27


----     83 VARIABLE alloc.L  new allocation

           zone1       zone2       zone3       zone4

catA         132         180         196
catB          99          83         121         196
catC         187          81          33          39
catD          47          49          34          96


----     83 VARIABLE totcost.L             =       12.400  total cost

备注：

有趣的是，该解决方案表明我们将人员移出 1 区，以便为 4 区的人员腾出空间。因此，在某些情况下，进行 2 次移动以重新安置一个人的成本更低。当然，这在很大程度上取决于成本结构。
主要约束说：allocation = demand + inflow - outflow
约束 move(c,z,z)=0 确保我们不会从 z 移动到 z 本身。这个约束并不是真正需要的（它由成本隐含地强制执行）。为了清楚起见，我添加了它。实际上，我通过将 move(c,z,z) 的上限设置为零（即没有明确的约束）来实现这一点。对于非常大的模型，我会使用另一种可能性：甚至不生成变量 move(c,z,z)。这个模型很小，所以不需要那个。如果你愿意，你可以完全省略它。
我没有在模型中使用 population。我认为没有必要，除非我们查看下一个项目符号。
有一些微妙之处需要思考：我们只能移动新人吗？（即应该允许原人留下来）

Answer 2

我采用了 Erwin 的公式，对其进行了修改，以考虑分配应该大于每个区域和类别的人口（这意味着已经存在的个人不会移动），并使用 {lpSolve} 实现它包，不需要安装外部系统库。

欧文的解可以通过下面的move_new_only <- FALSE获得。

设置

library(tidyverse)
library(lpSolve)

move_new_only  <- TRUE # means population in place can't be reallocated
categories     <- unique(population_and_demand_by_category_and_zone$category)
zones          <- unique(population_and_demand_by_category_and_zone$zone)
n_cat          <- length(categories)
n_zones        <- length(zones)

# empty coefficient arrays
move_coefs_template  <- array(0, c(n_zones, n_zones, n_cat), 
                              dimnames = list(zones, zones, categories))
alloc_coefs_template <- matrix(0, n_zones, n_cat, 
                               dimnames = list(zones, categories))

build_zone_by_category_matrix <- function(data, col) {
  data %>% 
    pivot_wider(
      id_cols = zone, names_from = category, values_from = {{col}}) %>% 
    as.data.frame() %>% 
    `row.names<-`(.$zone) %>% 
    select(-zone) %>% 
    as.matrix()
}

demand_mat <- build_zone_by_category_matrix(
  population_and_demand_by_category_and_zone, demand)

cost_mat <- build_zone_by_category_matrix(costs, cost)

population_mat <- build_zone_by_category_matrix(
  population_and_demand_by_category_and_zone, population)

OBJECTIVE 功能：总成本

# stack the cost matrix vertically to build an array of all move coefficients
coefs_obj <- move_coefs_template
for(i in 1:n_zones) {
  coefs_obj[i,,] <- cost_mat
}
# flatten it for `lp`s `objective.in` argument, adding alloc coefs
f.obj <- c(coefs_obj, alloc_coefs_template)

约束 1：分配 = 需求 + 流入 - 流出

coefs_con <- list() 
for (z in zones) {
  coefs_con_zone <- list() 
  for(categ in categories) {
    coefs_arrivals <- move_coefs_template
    coefs_arrivals[,z, categ] <- 1
    coefs_departures <- move_coefs_template
    coefs_departures[z,, categ] <- 1
    coefs_moves <- coefs_arrivals - coefs_departures
    coefs_alloc <- alloc_coefs_template
    coefs_alloc[z, categ] <- -1
    # flatten the array
    coefs_con_zone[[categ]] <- c(coefs_moves, coefs_alloc)
  }
  coefs_con[[z]] <- do.call(rbind, coefs_con_zone)
}

# stack the flattened arrays to build a matrix
f.con1 <- do.call(rbind, coefs_con)
f.dir1 <- rep("==", n_zones * n_cat)
f.rhs1 <- -c(t(demand_mat)) # transposing so we start with all zone 1 and so on

约束 2：分配永远不会超过容量

coefs_con <- list() 
for (z in zones) {
  coefs_alloc <- alloc_coefs_template
  coefs_alloc[z, ] <- 1
  coefs_con[[z]] <- c(move_coefs_template, coefs_alloc)
}

# stack the flattened arrays to build a matrix
f.con2 <- do.call(rbind, coefs_con)
f.dir2 <- rep("<=", n_zones)

f.rhs2 <- demand_and_capacity_by_zone$capacity

约束 3：分配 >= 人口

即我们不会移动已经在那里的人。

如果我们决定可以移动它们，则规则变为 Allocation >= 0，我们得到欧文的答案。

coefs_con <- list() 
for (z in zones) {
  coefs_con_zone <- list() 
  for(categ in categories) {
    coefs_alloc <- alloc_coefs_template
    coefs_alloc[z, categ] <- 1
    # flatten the array
    coefs_con_zone[[categ]] <- c(move_coefs_template, coefs_alloc)
  }
  coefs_con[[z]] <- do.call(rbind, coefs_con_zone)
}

# stack the flattened arrays to build a matrix
f.con3 <- do.call(rbind, coefs_con)
f.dir3 <- rep(">=", n_zones * n_cat)

if (move_new_only) {
  f.rhs3 <- c(t(population_mat))
} else {
  f.rhs3 <- rep(0, n_zones * n_cat)
}

连接对象

f.con <- rbind(f.con1, f.con2, f.con3)
f.dir <- c(f.dir1, f.dir2, f.dir3)
f.rhs <- c(f.rhs1, f.rhs2, f.rhs3)

解决

# compute the solution and visualize it in the array
results_raw <- lp("min", f.obj, f.con, f.dir, f.rhs)$solution
results_moves <- move_coefs_template
results_moves[] <- 
  results_raw[1:length(results_moves)]
results_allocs <- alloc_coefs_template
results_allocs[] <- 
  results_raw[length(results_moves)+(1:length(results_allocs))]
results_moves
#> , , A
#> 
#>    1 2 3 4
#> 1  0 0 0 0
#> 2  0 0 3 0
#> 3  0 0 0 0
#> 4 13 0 2 0
#> 
#> , , B
#> 
#>   1 2  3 4
#> 1 0 0  0 0
#> 2 0 0  0 0
#> 3 0 0  0 0
#> 4 0 0 57 0
#> 
#> , , C
#> 
#>   1 2 3 4
#> 1 0 0 0 0
#> 2 0 0 0 0
#> 3 0 0 0 0
#> 4 8 0 0 0
#> 
#> , , D
#> 
#>   1  2 3 4
#> 1 0  0 0 0
#> 2 0  0 0 0
#> 3 0  0 0 0
#> 4 0 38 0 0

results_allocs
#>     A   B   C  D
#> 1 151  99 168 47
#> 2 142  83  81 87
#> 3 139 178  33 34
#> 4  76 139  58 58

整齐的结果

# format as tidy data frame
results_df <-
  as.data.frame.table(results_moves) %>% 
  setNames(c("from", "to", "category", "n")) %>% 
  filter(n != 0) %>% 
  mutate_at(c("from", "to"), as.numeric) %>% 
  mutate_if(is.factor, as.character)

results_df
#>   from to category  n
#> 1    4  1        A 13
#> 2    2  3        A  3
#> 3    4  3        A  2
#> 4    4  3        B 57
#> 5    4  1        C  8
#> 6    4  2        D 38

更新表格

population_and_demand_by_category_and_zone <-
  bind_rows(
  results_df %>% 
  group_by(category, zone = to) %>% 
  summarize(correction = sum(n), .groups = "drop"),
  results_df %>% 
    group_by(category, zone = from) %>% 
    summarize(correction = -sum(n), .groups = "drop"),
  ) %>% 
  left_join(population_and_demand_by_category_and_zone, ., by = c("category", "zone")) %>% 
  replace_na(list(correction =0)) %>% 
  mutate(new_population = demand + correction)

population_and_demand_by_category_and_zone
#> # A tibble: 16 × 6
#>    category  zone population demand correction new_population
#>    <chr>    <dbl>      <dbl>  <dbl>      <dbl>          <dbl>
#>  1 A            1        115    138      13               151
#>  2 A            2        121    145      -3.00            142
#>  3 A            3        112    134       5.00            139
#>  4 A            4         76     91     -15.0              76
#>  5 B            1         70     99       0                99
#>  6 B            2         59     83       0                83
#>  7 B            3         86    121      57               178
#>  8 B            4        139    196     -57               139
#>  9 C            1        142    160       8               168
#> 10 C            2         72     81       0                81
#> 11 C            3         29     33       0                33
#> 12 C            4         58     66      -8                58
#> 13 D            1         22     47       0                47
#> 14 D            2         23     49      38                87
#> 15 D            3         16     34       0                34
#> 16 D            4         45     96     -38                58


demand_and_capacity_by_zone <-
  population_and_demand_by_category_and_zone %>% 
  group_by(zone) %>% 
  summarise(population = sum(population), correction = sum(correction), new_population = sum(new_population)) %>% 
  left_join(demand_and_capacity_by_zone, ., by = "zone")
#> `summarise()` ungrouping output (override with `.groups` argument)
  
demand_and_capacity_by_zone
#> # A tibble: 4 × 7
#>    zone demand capacity capacity_exceeded population correction new_population
#>   <dbl>  <dbl>    <dbl> <lgl>                  <dbl>      <dbl>          <dbl>
#> 1     1    444      465 FALSE                    349         21            465
#> 2     2    358      393 FALSE                    275         35            393
#> 3     3    322      500 FALSE                    243         62            384
#> 4     4    449      331 TRUE                     318       -118            331

我们看到人口从未减少并且一直处于容量不足的状态。

最小化重新分配个人的成本

Minimise cost of reallocating individuals

optimization

r

cost-management

loss-function

设置