应用涉及 R 中两个数据框列的 if else 语句
Applying if else statements involving two dataframe columns in R
我正在尝试修改一个包含两列的数据框,添加第三列 returns 四种可能的表达式,具体取决于其他列的内容(即每列是正数还是负数)。
我尝试了几种方法,dplyr 中的 'mutate' 函数以及 sapply。不幸的是,我似乎遗漏了一些东西,因为我收到错误消息“条件的长度 > 1,并且只会使用第一个元素”。因此只有第一次迭代应用于新列中的每一行。
一个可重现的例子(我试过的变异方法)如下:
Costs <- c(2, -5, -7, 3, 12)
Outcomes <- c(-2, 5, -7, 3, -2)
results <- as.data.frame(cbind(Costs, Outcomes))
results
quadrant <- function(cost,outcome) {
if (costs < 0 &
outcomes < 0) {
"SW Quadrant"
}
else if (costs<0 & outcomes>0){
"Dominant"
}
else if (costs>0 & outcomes<0){
"Dominated"
}
else{""}
}
results <- mutate(results,Quadrant = quadrant(Costs,Outcomes)
)
完整的警告信息是:
Warning messages:
1: Problem with mutate()
input Quadrant
.
i the condition has length > 1 and only the first element will be used
i Input Quadrant
is quadrant(results$Costs, results$Outcomes)
.
2: In if (costs < 0 & outcomes < 0) { :
the condition has length > 1 and only the first element will be used
3: Problem with mutate()
input Quadrant
.
i the condition has length > 1 and only the first element will be used
i Input Quadrant
is quadrant(results$Costs, results$Outcomes)
.
4: In if (costs < 0 & outcomes > 0) { :
the condition has length > 1 and only the first element will be used
5: Problem with mutate()
input Quadrant
.
i the condition has length > 1 and only the first element will be used
i Input Quadrant
is quadrant(results$Costs, results$Outcomes)
.
6: In if (costs > 0 & outcomes < 0) { :
the condition has length > 1 and only the first element will be used<
我对 sapply 函数的尝试:
results <- sapply(results$Quadrant,quadrant(results$Costs,results$Outcomes))
导致以下错误,并向 mutate 方法发出一致的警告消息。
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'Dominated' of mode 'function' was not found
我确定我在这里遗漏了一些明显的东西。感谢您的任何建议。
您可能想要更多类似的东西:
costs <- c(2, -5, -7, 3, 12)
outcomes <- c(-2, 5, -7, 3, -2)
results <- as.data.frame(cbind(costs, outcomes))
results <- results %>% mutate(Quadrant = case_when(
outcomes < 0 & costs < 0 ~ "SW Quadrant",
costs < 0 & outcomes > 0 ~ "Dominant",
costs > 0 & outcomes < 0 ~ "Dominated",
TRUE ~ ""))
results
# costs outcomes Quadrant
# 1 2 -2 Dominated
# 2 -5 5 Dominant
# 3 -7 -7 SW Quadrant
# 4 3 3
# 5 12 -2 Dominated
该函数有两个问题。
- 您使用
cost
定义函数,但使用 costs
(结果相同);
- 您使用
if
严格要求长度为 1 的逻辑条件,有两处错误:您使用 &
几乎不应该在 if
中像这样暴露声明, 和 你正在传递向量,所以 cost < 0
将 return 一个与 cost
长度相同的逻辑向量(这里大于 1) .
建议:
quadrant_sgl <- function(cost, outcome) {
if (cost < 0 && outcome < 0) return("SW Quadrant")
if (cost < 0 && outcome > 0) return("Dominant")
if (cost > 0 && outcome < 0) return("Dominated")
return("")
}
quadrant_vec1 <- function(cost, outcome) {
ifelse(cost < 0 & outcome < 0, "SW Quadrant",
ifelse(cost < 0 & outcome > 0, "Dominant",
ifelse(cost > 0 & outcome < 0, "Dominated",
"")))
}
quadrant_vec2 <- function(cost, outcome) {
ifelse(cost < 0,
ifelse(outcome < 0, "SW Quadrant", "Dominant"),
ifelse(outcome < 0, "Dominated", ""))
}
quadrant_vec3 <- function(cost, outcome) {
dplyr::case_when(
cost < 0 & outcome < 0 ~ "SW Quadrant",
cost < 0 & outcome > 0 ~ "Dominant",
cost > 0 & outcome < 0 ~ "Dominated",
TRUE ~ ""
)
}
quadrant_vec4 <- function(cost, outcome) {
data.table::fcase(
cost < 0 & outcome < 0, "SW Quadrant",
cost < 0 & outcome > 0, "Dominant",
cost > 0 & outcome < 0, "Dominated",
rep(TRUE, length(cost)), ""
)
}
第一个函数(quadrant_sgl
)将保持单操作(未向量化)的函数转换为向量化函数。如果您不熟悉向量化的概念,请知道 (1) R 做得很好,(2) R 更喜欢它,以及 (3) 这不是详细讨论这个的最佳场所。搜索“R vectorization”,您应该会找到很多关于此的 material。
因此,第一个只是演示当函数不能(由于时间、编程技巧或其他原因)无法转换为向量化友好函数时该怎么做。使用 Vectorize
.
其他功能都比较等效
如果您正在使用 dplyr
和朋友,那么我强烈建议使用 quadrant_vec3
,因为它比嵌套的 ifelse
更易于阅读和维护(IMO) . (顺便说一句:如果你必须使用嵌套 ifelse
,那么至少使用嵌套的 dplyr::if_else
,因为它们通常比基 R 的 ifelse
更安全。)
如果您正在探索 data.table
的世界,那么 quadrant_vec4
相当于使用 data.table
自己的 fcase
功能,与 case_when
.
演示:
Vectorize(quadrant_sgl, vectorize.args = c("cost", "outcome"))(results$Costs, results$Outcomes)
# [1] "Dominated" "Dominant" "SW Quadrant" "" "Dominated"
quadrant_vec1(results$Costs, results$Outcomes)
# [1] "Dominated" "Dominant" "SW Quadrant" "" "Dominated"
quadrant_vec2(results$Costs, results$Outcomes)
# [1] "Dominated" "Dominant" "SW Quadrant" "" "Dominated"
quadrant_vec3(results$Costs, results$Outcomes)
# [1] "Dominated" "Dominant" "SW Quadrant" "" "Dominated"
我正在尝试修改一个包含两列的数据框,添加第三列 returns 四种可能的表达式,具体取决于其他列的内容(即每列是正数还是负数)。
我尝试了几种方法,dplyr 中的 'mutate' 函数以及 sapply。不幸的是,我似乎遗漏了一些东西,因为我收到错误消息“条件的长度 > 1,并且只会使用第一个元素”。因此只有第一次迭代应用于新列中的每一行。
一个可重现的例子(我试过的变异方法)如下:
Costs <- c(2, -5, -7, 3, 12)
Outcomes <- c(-2, 5, -7, 3, -2)
results <- as.data.frame(cbind(Costs, Outcomes))
results
quadrant <- function(cost,outcome) {
if (costs < 0 &
outcomes < 0) {
"SW Quadrant"
}
else if (costs<0 & outcomes>0){
"Dominant"
}
else if (costs>0 & outcomes<0){
"Dominated"
}
else{""}
}
results <- mutate(results,Quadrant = quadrant(Costs,Outcomes)
)
完整的警告信息是:
Warning messages: 1: Problem with
mutate()
inputQuadrant
. i the condition has length > 1 and only the first element will be used i InputQuadrant
isquadrant(results$Costs, results$Outcomes)
. 2: In if (costs < 0 & outcomes < 0) { : the condition has length > 1 and only the first element will be used 3: Problem withmutate()
inputQuadrant
. i the condition has length > 1 and only the first element will be used i InputQuadrant
isquadrant(results$Costs, results$Outcomes)
. 4: In if (costs < 0 & outcomes > 0) { : the condition has length > 1 and only the first element will be used 5: Problem withmutate()
inputQuadrant
. i the condition has length > 1 and only the first element will be used i InputQuadrant
isquadrant(results$Costs, results$Outcomes)
. 6: In if (costs > 0 & outcomes < 0) { : the condition has length > 1 and only the first element will be used<
我对 sapply 函数的尝试:
results <- sapply(results$Quadrant,quadrant(results$Costs,results$Outcomes))
导致以下错误,并向 mutate 方法发出一致的警告消息。
Error in get(as.character(FUN), mode = "function", envir = envir) : object 'Dominated' of mode 'function' was not found
我确定我在这里遗漏了一些明显的东西。感谢您的任何建议。
您可能想要更多类似的东西:
costs <- c(2, -5, -7, 3, 12)
outcomes <- c(-2, 5, -7, 3, -2)
results <- as.data.frame(cbind(costs, outcomes))
results <- results %>% mutate(Quadrant = case_when(
outcomes < 0 & costs < 0 ~ "SW Quadrant",
costs < 0 & outcomes > 0 ~ "Dominant",
costs > 0 & outcomes < 0 ~ "Dominated",
TRUE ~ ""))
results
# costs outcomes Quadrant
# 1 2 -2 Dominated
# 2 -5 5 Dominant
# 3 -7 -7 SW Quadrant
# 4 3 3
# 5 12 -2 Dominated
该函数有两个问题。
- 您使用
cost
定义函数,但使用costs
(结果相同); - 您使用
if
严格要求长度为 1 的逻辑条件,有两处错误:您使用&
几乎不应该在if
中像这样暴露声明, 和 你正在传递向量,所以cost < 0
将 return 一个与cost
长度相同的逻辑向量(这里大于 1) .
建议:
quadrant_sgl <- function(cost, outcome) {
if (cost < 0 && outcome < 0) return("SW Quadrant")
if (cost < 0 && outcome > 0) return("Dominant")
if (cost > 0 && outcome < 0) return("Dominated")
return("")
}
quadrant_vec1 <- function(cost, outcome) {
ifelse(cost < 0 & outcome < 0, "SW Quadrant",
ifelse(cost < 0 & outcome > 0, "Dominant",
ifelse(cost > 0 & outcome < 0, "Dominated",
"")))
}
quadrant_vec2 <- function(cost, outcome) {
ifelse(cost < 0,
ifelse(outcome < 0, "SW Quadrant", "Dominant"),
ifelse(outcome < 0, "Dominated", ""))
}
quadrant_vec3 <- function(cost, outcome) {
dplyr::case_when(
cost < 0 & outcome < 0 ~ "SW Quadrant",
cost < 0 & outcome > 0 ~ "Dominant",
cost > 0 & outcome < 0 ~ "Dominated",
TRUE ~ ""
)
}
quadrant_vec4 <- function(cost, outcome) {
data.table::fcase(
cost < 0 & outcome < 0, "SW Quadrant",
cost < 0 & outcome > 0, "Dominant",
cost > 0 & outcome < 0, "Dominated",
rep(TRUE, length(cost)), ""
)
}
第一个函数(quadrant_sgl
)将保持单操作(未向量化)的函数转换为向量化函数。如果您不熟悉向量化的概念,请知道 (1) R 做得很好,(2) R 更喜欢它,以及 (3) 这不是详细讨论这个的最佳场所。搜索“R vectorization”,您应该会找到很多关于此的 material。
因此,第一个只是演示当函数不能(由于时间、编程技巧或其他原因)无法转换为向量化友好函数时该怎么做。使用 Vectorize
.
其他功能都比较等效
如果您正在使用 dplyr
和朋友,那么我强烈建议使用 quadrant_vec3
,因为它比嵌套的 ifelse
更易于阅读和维护(IMO) . (顺便说一句:如果你必须使用嵌套 ifelse
,那么至少使用嵌套的 dplyr::if_else
,因为它们通常比基 R 的 ifelse
更安全。)
如果您正在探索 data.table
的世界,那么 quadrant_vec4
相当于使用 data.table
自己的 fcase
功能,与 case_when
.
演示:
Vectorize(quadrant_sgl, vectorize.args = c("cost", "outcome"))(results$Costs, results$Outcomes)
# [1] "Dominated" "Dominant" "SW Quadrant" "" "Dominated"
quadrant_vec1(results$Costs, results$Outcomes)
# [1] "Dominated" "Dominant" "SW Quadrant" "" "Dominated"
quadrant_vec2(results$Costs, results$Outcomes)
# [1] "Dominated" "Dominant" "SW Quadrant" "" "Dominated"
quadrant_vec3(results$Costs, results$Outcomes)
# [1] "Dominated" "Dominant" "SW Quadrant" "" "Dominated"