使用 R 根据另一列的条件更新一列(使用 NA)

Update one column (with NAs) based on the conditions of another column using R

给定一个小数据集如下:

df <- structure(list(date = c("2021-09", "2021-10", "2021-11", "2021-12", 
"2021-06", "2021-10"), act_direction = c("decrease", "increase", 
NA, NA, "unchanged", "unchanged"), pred_direction = c(NA, "decrease", 
NA, NA, "decrease", "increase"), direction_acc = c("true", "-", 
"-", "true", "false", "false")), class = "data.frame", row.names = c(NA, 
-6L))

df:

     date act_direction pred_direction direction_acc
1 2021-09      decrease           <NA>          true
2 2021-10      increase       decrease             -
3 2021-11          <NA>           <NA>             -
4 2021-12          <NA>           <NA>          true
5 2021-06     unchanged       decrease         false
6 2021-10     unchanged       increase         false

我尝试根据act_direction列更新direction_acc,更具体地说,如果act_direction==unchanged,那么我想将direction_acc的内容更新为true,忽略其原始值。

预期结果:

     date act_direction pred_direction direction_acc
1 2021-09      decrease           <NA>          true
2 2021-10      increase       decrease             -
3 2021-11          <NA>           <NA>             -
4 2021-12          <NA>           <NA>          true
5 2021-06     unchanged       decrease          true
6 2021-10     unchanged       increase          true

我使用了下面的代码,我发现没有错误,但它 returns 结果出乎意料,因为 direction_acc 中的其他值已更改为 NAs:

df %>% 
  # mutate_all(na_if, '') %>% # I will replace empty cell '' to NA as example data
  mutate(direction_acc = ifelse(act_direction == 'unchanged', 
                                'true', 
                                as.character(direction_acc)))

结果:

     date act_direction pred_direction direction_acc
1 2021-09      decrease           <NA>          true
2 2021-10      increase       decrease             -
3 2021-11          <NA>           <NA>          <NA>
4 2021-12          <NA>           <NA>          <NA>
5 2021-06     unchanged       decrease          true
6 2021-10     unchanged       increase          true

所以我的问题是为什么当 act_directionNAs 时,direction_acc 也变成了 NAs,我们怎样才能正确地实现它?

基础 R

df$direction_acc[df$act_direction == "unchanged"] <- "true"
df
#      date act_direction pred_direction direction_acc
# 1 2021-09                                       true
# 2 2021-10                     decrease             -
# 3 2021-11                                          -
# 4 2021-12                                       true
# 5 2021-06     unchanged       decrease          true
# 6 2021-10     unchanged       increase          true

dplyr

library(dplyr)
df %>%
  mutate(
    direction_acc = if_else(act_direction == "unchanged", "true", direction_acc)
  )

如果我像你一样添加 NA(我将使用 across,因为 mutate_if 已弃用),那么我们可以将 == 更改为 %in% 以获得所需的效果。

df %>%
  mutate(across(where(is.character), ~ na_if(., ""))) %>%
  mutate(
    direction_acc = if_else(act_direction %in% "unchanged", "true", direction_acc)
  )
#      date act_direction pred_direction direction_acc
# 1 2021-09          <NA>           <NA>          true
# 2 2021-10          <NA>       decrease             -
# 3 2021-11          <NA>           <NA>             -
# 4 2021-12          <NA>           <NA>          true
# 5 2021-06     unchanged       decrease          true
# 6 2021-10     unchanged       increase          true