R 中未知数量查询的嵌套 ifelse() 或 case_when()

Question

我有一个数据框，我想根据数据框给定行和列中的值对其进行分组

my_data <- data.frame(matrix(ncol = 3, nrow = 4))
colnames(my_data) <- c('Position', 'Group', 'Data')
                      
my_data[,1] <- c('A1','B1','C1','D1')
my_data[,3] <- c(1,2,3,4)

grps <- list(c('A1','B1'),
             
             c('C1','D1'))

grp.names = c("Control", "Exp1", "EMPTY")


my_data$Group <- case_when(
  my_data$Position %in% grps[[1]] ~ grp.names[1],
  my_data$Position %in% grps[[2]] ~ grp.names[2]
)

或

my_data$Group <- with(my_data, ifelse(Position %in% grps[[1]], grp.names[1],
                                    ifelse(Position %in% grps[[2]], grp.names[2], 
                                    grp.names[3])))

这些示例有效并生成带有适当标签的组列，但是我需要在 grps 列表的长度上具有灵活性，从 1 到大约 25。

我看不到在 for 循环中遍历 case_with 或 ifelse 的方法，例如

my_data$Group <- for (i in 1:length(grps)){
  case_when(
    my_data$Well %in% grps[[i]] ~ grp.names[i])
}

此示例只是删除组列

处理变量 grps 长度的最合适方法是什么？

Answer 1

方法 1：哈希 table

我会在这里选择不同的方法，因为组构成在分析过程中可能会发生变化，特别是查找 key-value 对的 table，并编写一个小的访问函数。

library(tidyverse)

# First, a small adjustment to `grps` to reflect an empty group.
grps <- list(c('A1','B1'),
             c('C1','D1'),
             NULL)
names <- unlist(grps, use.names = F)
values <- rep(grp.names, map_dbl(grps, length))

h = as.list(values) %>%
  set_names(names) %>%
  list2env()

# find x in h
f <- Vectorize(function(x) h[[x]], c("x")) # scoping here

这需要一些时间来设置，但使用起来很方便：

my_data %>%
  mutate(Groups = f(Position))

  Position   Group Data
1       A1 Control    1
2       B1 Control    2
3       C1    Exp1    3
4       D1    Exp1    4

这避免了在多个地方更改代码，并且可以采用任意长度的组。

方法二：动态切换

或者，我们可以创建一个任意长度的 switch 表达式，从组名及其唯一值构建它。

constructor <- function(ids, names){
  purrr::imap_chr(as.character(ids), ~paste(paste0("\"", .x ,"\""),
                                            paste0("\"", names[.y], "\""),
                                            sep = "=")) %>%
    paste0(collapse = ", ") %>%
    paste0("Vectorize(function(x) switch(as.character(x), ", ., ", NA))", collapse = "") %>%
    str2expression()
}

my_data %>%
  mutate(Group = eval(constructor(names, values)))

在这种情况下，它将计算表达式

expression(Vectorize(function(x) switch(as.character(x), A1 = "Control", 
    B1 = "Control", C1 = "Exp1", D1 = "Exp1", 
    NA)))

Answer 2

我相信你的问题意味着 grps 变量是一个列表，该列表中的每个元素本身就是一个数组，其中包含属于该组的所有位置。

具体来说，在下面的 grps 变量中，如果 Position 是“A1”或“B1”，则它属于您的第一个条目 grp.names。同样，如果位置是“C1”或“D1”，则它属于您在 grp.names

中的第二个条目

> grps
[[1]]
[1] "A1" "B1"

[[2]]
[1] "C1" "D1"

假设是这种情况，您可以执行以下操作：

matching_group_df <- sapply(grps, function(x){ my_data$Position %in% x})
selected_group <- apply(matching_group_df, 1, function(x){which(x == TRUE)})
my_data$Group <- grp.names[selected_group]

  Position   Group Data
1       A1 Control    1
2       B1 Control    2
3       C1    Exp1    3
4       D1    Exp1    4

其工作方式如下：

matching_group_df是True/False的矩阵（通过sapply函数创建），指定位置属于哪个组索引：

> matching_group_df
      [,1]  [,2]
[1,]  TRUE FALSE
[2,]  TRUE FALSE
[3,] FALSE  TRUE
[4,] FALSE  TRUE

然后您 select 使用应用命令逐行具有 TRUE 值的列：

selected_group <- apply(matching_group_df, 1, function(x){which(x == TRUE)})

> selected_group
[1] 1 1 2 2

最后，您将这些索引传递到您的 grp.names 列表到 select 适当的索引，并将它们设置到您的原始数据框中。

grp.names[selected_group]
[1] "Control" "Control" "Exp1"    "Exp1"

如果这对你很重要，这也有一个小的好处，那就是只使用基本的 R 函数。

Answer 3

对于 my_data$Position 中的每个项目，您要遍历每个 grps 并查找匹配项并分配 grp.names，如果是的话。如果您在任何 grp 中都找不到匹配项，请分配 grp.names[3]:

my_data$Group <- lapply(my_data$Position, function(position){ # Goes through each my_data$Position
  for(i in 1:length(grps)){
    if(position %in% grps[[i]]){
      return(grp.names[i]) # Give matching index of grp.names to grps
    } else if (i == length(grps)){ # if no matches assign grp.names[3]
      return(grp.names[3])
    }
  }
}) %>% unlist() # Put the list into a vector

R 中未知数量查询的嵌套 ifelse() 或 case_when()

Nested ifelse() or case_when() for unknown number of queries in R

for-loop

if-statement

r

dataframe

方法 1：哈希 table

方法二：动态切换