使用 R 根据另一列的条件更新一列(使用 NA)
Update one column (with NAs) based on the conditions of another column using R
给定一个小数据集如下:
df <- structure(list(date = c("2021-09", "2021-10", "2021-11", "2021-12",
"2021-06", "2021-10"), act_direction = c("decrease", "increase",
NA, NA, "unchanged", "unchanged"), pred_direction = c(NA, "decrease",
NA, NA, "decrease", "increase"), direction_acc = c("true", "-",
"-", "true", "false", "false")), class = "data.frame", row.names = c(NA,
-6L))
df:
date act_direction pred_direction direction_acc
1 2021-09 decrease <NA> true
2 2021-10 increase decrease -
3 2021-11 <NA> <NA> -
4 2021-12 <NA> <NA> true
5 2021-06 unchanged decrease false
6 2021-10 unchanged increase false
我尝试根据act_direction
列更新direction_acc
,更具体地说,如果act_direction==unchanged
,那么我想将direction_acc
的内容更新为true
,忽略其原始值。
预期结果:
date act_direction pred_direction direction_acc
1 2021-09 decrease <NA> true
2 2021-10 increase decrease -
3 2021-11 <NA> <NA> -
4 2021-12 <NA> <NA> true
5 2021-06 unchanged decrease true
6 2021-10 unchanged increase true
我使用了下面的代码,我发现没有错误,但它 returns 结果出乎意料,因为 direction_acc
中的其他值已更改为 NA
s:
df %>%
# mutate_all(na_if, '') %>% # I will replace empty cell '' to NA as example data
mutate(direction_acc = ifelse(act_direction == 'unchanged',
'true',
as.character(direction_acc)))
结果:
date act_direction pred_direction direction_acc
1 2021-09 decrease <NA> true
2 2021-10 increase decrease -
3 2021-11 <NA> <NA> <NA>
4 2021-12 <NA> <NA> <NA>
5 2021-06 unchanged decrease true
6 2021-10 unchanged increase true
所以我的问题是为什么当 act_direction
和 NA
s 时,direction_acc
也变成了 NA
s,我们怎样才能正确地实现它?
基础 R
df$direction_acc[df$act_direction == "unchanged"] <- "true"
df
# date act_direction pred_direction direction_acc
# 1 2021-09 true
# 2 2021-10 decrease -
# 3 2021-11 -
# 4 2021-12 true
# 5 2021-06 unchanged decrease true
# 6 2021-10 unchanged increase true
dplyr
library(dplyr)
df %>%
mutate(
direction_acc = if_else(act_direction == "unchanged", "true", direction_acc)
)
如果我像你一样添加 NA
(我将使用 across
,因为 mutate_if
已弃用),那么我们可以将 ==
更改为 %in%
以获得所需的效果。
df %>%
mutate(across(where(is.character), ~ na_if(., ""))) %>%
mutate(
direction_acc = if_else(act_direction %in% "unchanged", "true", direction_acc)
)
# date act_direction pred_direction direction_acc
# 1 2021-09 <NA> <NA> true
# 2 2021-10 <NA> decrease -
# 3 2021-11 <NA> <NA> -
# 4 2021-12 <NA> <NA> true
# 5 2021-06 unchanged decrease true
# 6 2021-10 unchanged increase true
给定一个小数据集如下:
df <- structure(list(date = c("2021-09", "2021-10", "2021-11", "2021-12",
"2021-06", "2021-10"), act_direction = c("decrease", "increase",
NA, NA, "unchanged", "unchanged"), pred_direction = c(NA, "decrease",
NA, NA, "decrease", "increase"), direction_acc = c("true", "-",
"-", "true", "false", "false")), class = "data.frame", row.names = c(NA,
-6L))
df:
date act_direction pred_direction direction_acc
1 2021-09 decrease <NA> true
2 2021-10 increase decrease -
3 2021-11 <NA> <NA> -
4 2021-12 <NA> <NA> true
5 2021-06 unchanged decrease false
6 2021-10 unchanged increase false
我尝试根据act_direction
列更新direction_acc
,更具体地说,如果act_direction==unchanged
,那么我想将direction_acc
的内容更新为true
,忽略其原始值。
预期结果:
date act_direction pred_direction direction_acc
1 2021-09 decrease <NA> true
2 2021-10 increase decrease -
3 2021-11 <NA> <NA> -
4 2021-12 <NA> <NA> true
5 2021-06 unchanged decrease true
6 2021-10 unchanged increase true
我使用了下面的代码,我发现没有错误,但它 returns 结果出乎意料,因为 direction_acc
中的其他值已更改为 NA
s:
df %>%
# mutate_all(na_if, '') %>% # I will replace empty cell '' to NA as example data
mutate(direction_acc = ifelse(act_direction == 'unchanged',
'true',
as.character(direction_acc)))
结果:
date act_direction pred_direction direction_acc
1 2021-09 decrease <NA> true
2 2021-10 increase decrease -
3 2021-11 <NA> <NA> <NA>
4 2021-12 <NA> <NA> <NA>
5 2021-06 unchanged decrease true
6 2021-10 unchanged increase true
所以我的问题是为什么当 act_direction
和 NA
s 时,direction_acc
也变成了 NA
s,我们怎样才能正确地实现它?
基础 R
df$direction_acc[df$act_direction == "unchanged"] <- "true"
df
# date act_direction pred_direction direction_acc
# 1 2021-09 true
# 2 2021-10 decrease -
# 3 2021-11 -
# 4 2021-12 true
# 5 2021-06 unchanged decrease true
# 6 2021-10 unchanged increase true
dplyr
library(dplyr)
df %>%
mutate(
direction_acc = if_else(act_direction == "unchanged", "true", direction_acc)
)
如果我像你一样添加 NA
(我将使用 across
,因为 mutate_if
已弃用),那么我们可以将 ==
更改为 %in%
以获得所需的效果。
df %>%
mutate(across(where(is.character), ~ na_if(., ""))) %>%
mutate(
direction_acc = if_else(act_direction %in% "unchanged", "true", direction_acc)
)
# date act_direction pred_direction direction_acc
# 1 2021-09 <NA> <NA> true
# 2 2021-10 <NA> decrease -
# 3 2021-11 <NA> <NA> -
# 4 2021-12 <NA> <NA> true
# 5 2021-06 unchanged decrease true
# 6 2021-10 unchanged increase true