应用涉及 R 中两个数据框列的 if else 语句

Applying if else statements involving two dataframe columns in R

我正在尝试修改一个包含两列的数据框,添加第三列 returns 四种可能的表达式,具体取决于其他列的内容(即每列是正数还是负数)。

我尝试了几种方法,dplyr 中的 'mutate' 函数以及 sapply。不幸的是,我似乎遗漏了一些东西,因为我收到错误消息“条件的长度 > 1,并且只会使用第一个元素”。因此只有第一次迭代应用于新列中的每一行。

一个可重现的例子(我试过的变异方法)如下:

Costs <- c(2, -5, -7, 3, 12)
Outcomes <- c(-2, 5, -7, 3, -2)

results <- as.data.frame(cbind(Costs, Outcomes))
results

quadrant <- function(cost,outcome) {
        if (costs < 0 &
            outcomes < 0) {
                "SW Quadrant"
        }
        else if (costs<0 & outcomes>0){
                "Dominant"
        } 
        else if (costs>0 & outcomes<0){
                "Dominated"
        }
        else{""}
}


results <- mutate(results,Quadrant = quadrant(Costs,Outcomes)
        )

完整的警告信息是:

Warning messages: 1: Problem with mutate() input Quadrant. i the condition has length > 1 and only the first element will be used i Input Quadrant is quadrant(results$Costs, results$Outcomes). 2: In if (costs < 0 & outcomes < 0) { : the condition has length > 1 and only the first element will be used 3: Problem with mutate() input Quadrant. i the condition has length > 1 and only the first element will be used i Input Quadrant is quadrant(results$Costs, results$Outcomes). 4: In if (costs < 0 & outcomes > 0) { : the condition has length > 1 and only the first element will be used 5: Problem with mutate() input Quadrant. i the condition has length > 1 and only the first element will be used i Input Quadrant is quadrant(results$Costs, results$Outcomes). 6: In if (costs > 0 & outcomes < 0) { : the condition has length > 1 and only the first element will be used<

我对 sapply 函数的尝试:

results <- sapply(results$Quadrant,quadrant(results$Costs,results$Outcomes))

导致以下错误,并向 mutate 方法发出一致的警告消息。

Error in get(as.character(FUN), mode = "function", envir = envir) : object 'Dominated' of mode 'function' was not found

我确定我在这里遗漏了一些明显的东西。感谢您的任何建议。

您可能想要更多类似的东西:

costs <- c(2, -5, -7, 3, 12)
outcomes <- c(-2, 5, -7, 3, -2)

results <- as.data.frame(cbind(costs, outcomes))

results <- results %>% mutate(Quadrant = case_when(
  outcomes < 0 & costs < 0 ~ "SW Quadrant", 
  costs < 0 & outcomes > 0 ~ "Dominant", 
  costs > 0 & outcomes < 0 ~ "Dominated", 
  TRUE ~ ""))

results
#   costs outcomes    Quadrant
# 1     2       -2   Dominated
# 2    -5        5    Dominant
# 3    -7       -7 SW Quadrant
# 4     3        3            
# 5    12       -2   Dominated

该函数有两个问题。

  1. 您使用 cost 定义函数,但使用 costs(结果相同);
  2. 您使用 if 严格要求长度为 1 的逻辑条件,有两处错误:您使用 & 几乎不应该在 if 中像这样暴露声明, 你正在传递向量,所以 cost < 0 将 return 一个与 cost 长度相同的逻辑向量(这里大于 1) .

建议:

quadrant_sgl <- function(cost, outcome) {
  if (cost < 0 && outcome < 0) return("SW Quadrant")
  if (cost < 0 && outcome > 0) return("Dominant")
  if (cost > 0 && outcome < 0) return("Dominated")
  return("")
}

quadrant_vec1 <- function(cost, outcome) {
  ifelse(cost < 0 & outcome < 0, "SW Quadrant",
         ifelse(cost < 0 & outcome > 0, "Dominant",
                ifelse(cost > 0 & outcome < 0, "Dominated",
                       "")))
}

quadrant_vec2 <- function(cost, outcome) {
  ifelse(cost < 0,
         ifelse(outcome < 0, "SW Quadrant", "Dominant"),
         ifelse(outcome < 0, "Dominated", ""))
}

quadrant_vec3 <- function(cost, outcome) {
  dplyr::case_when(
    cost < 0 & outcome < 0 ~ "SW Quadrant",
    cost < 0 & outcome > 0 ~ "Dominant",
    cost > 0 & outcome < 0 ~ "Dominated",
    TRUE ~ ""
  )
}

quadrant_vec4 <- function(cost, outcome) {
  data.table::fcase(
    cost < 0 & outcome < 0, "SW Quadrant",
    cost < 0 & outcome > 0, "Dominant",
    cost > 0 & outcome < 0, "Dominated",
    rep(TRUE, length(cost)), ""
  )
}

第一个函数(quadrant_sgl)将保持单操作(未向量化)的函数转换为向量化函数。如果您不熟悉向量化的概念,请知道 (1) R 做得很好,(2) R 更喜欢它,以及 (3) 这不是详细讨论这个的最佳场所。搜索“R vectorization”,您应该会找到很多关于此的 material。

因此,第一个只是演示当函数不能(由于时间、编程技巧或其他原因)无法转换为向量化友好函数时该怎么做。使用 Vectorize.

其他功能都比较等效

如果您正在使用 dplyr 和朋友,那么我强烈建议使用 quadrant_vec3,因为它比嵌套的 ifelse 更易于阅读和维护(IMO) . (顺便说一句:如果你必须使用嵌套 ifelse,那么至少使用嵌套的 dplyr::if_else,因为它们通常比基 R 的 ifelse 更安全。)

如果您正在探索 data.table 的世界,那么 quadrant_vec4 相当于使用 data.table 自己的 fcase 功能,与 case_when.

演示:

Vectorize(quadrant_sgl, vectorize.args = c("cost", "outcome"))(results$Costs, results$Outcomes)
# [1] "Dominated"   "Dominant"    "SW Quadrant" ""            "Dominated"  
quadrant_vec1(results$Costs, results$Outcomes)
# [1] "Dominated"   "Dominant"    "SW Quadrant" ""            "Dominated"  
quadrant_vec2(results$Costs, results$Outcomes)
# [1] "Dominated"   "Dominant"    "SW Quadrant" ""            "Dominated"  
quadrant_vec3(results$Costs, results$Outcomes)
# [1] "Dominated"   "Dominant"    "SW Quadrant" ""            "Dominated"