R 操作具有条件的列表列表/加入数据

Question

我有以下数据显示 5 个可能被邀请参加聚会的孩子以及他们住在哪些街区。

我也有一个解决方案列表（孩子是否被邀请的二元指标；例如，第一个解决方案邀请了凯莉、吉娜和帕蒂。

data <- data.frame(c("Kelly", "Andrew", "Josh", "Gina", "Patty"), c(1, 1, 0, 1, 0), c(0, 1, 1, 1, 0))
names(data) <- c("Kid", "Neighborhood A", "Neighborhood B")
solutions <- list(c(1, 0, 0, 1, 1), c(0, 0, 0, 1, 1), c(0, 1, 0, 1, 1), c(1, 0, 1, 0, 1), c(0, 1, 0, 0, 1))

我正在寻找一种现在可以通过以下方式过滤解决方案的方法：

a) 只保留至少有 3 个来自社区 A 和社区 B 的孩子的解决方案（如果他们是两个社区的一部分，一个孩子可以算作两个孩子之一）

b) 仅保留至少选择了 3 个孩子的解决方案（即总和 >= 3）

我想我需要以某种方式将 data 加入 solutions 中的解决方案，但我对如何操作有点迷茫自解决方案以来的所有内容都卡在列表中。基本上是在寻找一种方法来向列表中的每个解决方案添加条目，以指示 a) 该解决方案有多少孩子，b) 有多少来自社区 A 的孩子，以及 c) 有多少来自社区 B 的孩子。从那里我必须以某种方式过滤列表以仅保留满足 >= 3?

的解决方案

提前致谢！

Answer 1

我写了一个小函数来检查每个解决方案，并根据您的要求 return TRUE 或 FALSE。使用 sapply() 将你的 solutions 传递给它会给你一个逻辑向量，你可以用它子集 solutions 以仅保留满足要求的那些。

check_solution <- function(solution, data) {
  data <- data[as.logical(solution),]
  sum(data[["Neighborhood A"]]) >= 3 && sum(data[["Neighborhood B"]]) >= 3
}
### No need for function to test whether `sum(solution) >= 3`, since 
### this will *always* be true if either neighborhood sums is >= 3.

tests <- sapply(solutions, check_solution, data = data)
# FALSE FALSE FALSE FALSE FALSE

solutions[tests]
# list()

### none of the `solutions` provided actually meet criteria

编辑： OP 在评论中询问如何针对数据中的所有社区进行测试，return TRUE 如果指定数量的社区有足够的孩子。以下是使用 dplyr.

的解决方案

library(dplyr)

data <- data.frame(
  c("Kelly", "Andrew", "Josh", "Gina", "Patty"), 
  c(1, 1, 0, 1, 0), 
  c(0, 1, 1, 1, 0),
  c(1, 1, 1, 0, 1),
  c(0, 1, 1, 1, 1)
)
names(data) <- c("Kid", "Neighborhood A", "Neighborhood B", "Neighborhood C", 
                 "Neighborhood D")
solutions <- list(c(1, 0, 0, 1, 1), c(0, 0, 0, 1, 1), c(0, 1, 0, 1, 1), 
                  c(1, 0, 1, 0, 1), c(0, 1, 0, 0, 1))

check_solution <- function(solution, 
                           data, 
                           min_kids = 3, 
                           min_neighborhoods = NULL) {
  neighborhood_tests <- data %>% 
    filter(as.logical(solution)) %>% 
    summarize(across(starts_with("Neighborhood"), ~ sum(.x) >= min_kids)) %>% 
    as.logical()
  # require all neighborhoods by default
  if (is.null(min_neighborhoods)) min_neighborhoods <- length(neighborhood_tests)
  sum(neighborhood_tests) >= min_neighborhoods
}

tests1 <- sapply(solutions, check_solution, data = data)
solutions[tests1]
# list()

tests2 <- sapply(
  solutions, 
  check_solution, 
  data = data, 
  min_kids = 2, 
  min_neighborhoods = 3
)
solutions[tests2]
# [[1]]
# [1] 1 0 0 1 1
# 
# [[2]]
# [1] 0 1 0 1 1

R 操作具有条件的列表列表/加入数据

R Manipulating List of Lists With Conditions / Joining Data

binary

join

r

data-manipulation

linear-programming