如何在 dplyr,R 中创建新的变量,条件是缺少其他变量?

How to create new variable conditional on missingness on others in dplyr, R?

考虑这些数据:

library(dplyr)

d <- tibble(student.status = c(0, 1, NA, 0, 1, 1),
            student.school.hs = c(NA, 1, NA, NA, NA, NA),
            student.school.alths = c(NA, NA, NA, NA, NA, 1),
            student.school.allNA = c(TRUE, FALSE, TRUE, TRUE, TRUE, FALSE)) 

  student.status student.school.hs student.school.alt… student.school.…
           <dbl>             <dbl>               <dbl> <lgl>           
1              0                NA                  NA TRUE            
2              1                 1                  NA FALSE           
3             NA                NA                  NA TRUE            
4              0                NA                  NA TRUE            
5              1                NA                  NA TRUE            
6              1                NA                   1 FALSE 

最终数据应如下所示:

  student.status student.school.hs student.school.alt… student.school.…
           <dbl>             <dbl>               <dbl> <lgl>           
1              0                NA                  NA TRUE            
2              1                 1                   0 FALSE           
3             NA                NA                  NA TRUE            
4              0                NA                  NA TRUE            
5              1                NA                  NA TRUE            
6              1                 0                   1 FALSE     

也许这有助于 - 循环 acrossstarts_with 列名称中的前缀 'student.school',同时从选择中删除逻辑列(-where(is.logical) - 作为 student.school.allNA 也有相同的前缀但列类型不同),然后使用 case_when 更改列的值,当它是 NA 时,如果 student.school.allNA 为 FALSE(取反 (!),连同 student.status 是 1)

library(dplyr)
d <- d %>%
   mutate(across(c(starts_with('student.school'), - where(is.logical)),
   ~ case_when(student.status %in% 1 & !student.school.allNA & is.na(.x) ~ 0, 
     TRUE ~ .x)))

-输出

> d
# A tibble: 6 × 4
  student.status student.school.hs student.school.alths student.school.allNA
           <dbl>             <dbl>                <dbl> <lgl>               
1              0                NA                   NA TRUE                
2              1                 1                    0 FALSE               
3             NA                NA                   NA TRUE                
4              0                NA                   NA TRUE                
5              1                NA                   NA TRUE                
6              1                 0                    1 FALSE