如何在 dplyr,R 中创建新的变量,条件是缺少其他变量?
How to create new variable conditional on missingness on others in dplyr, R?
考虑这些数据:
library(dplyr)
d <- tibble(student.status = c(0, 1, NA, 0, 1, 1),
student.school.hs = c(NA, 1, NA, NA, NA, NA),
student.school.alths = c(NA, NA, NA, NA, NA, 1),
student.school.allNA = c(TRUE, FALSE, TRUE, TRUE, TRUE, FALSE))
student.status student.school.hs student.school.alt… student.school.…
<dbl> <dbl> <dbl> <lgl>
1 0 NA NA TRUE
2 1 1 NA FALSE
3 NA NA NA TRUE
4 0 NA NA TRUE
5 1 NA NA TRUE
6 1 NA 1 FALSE
我想在 student.status == 1
和所有 student.school.*
列都不是 NA 时将“0”分配给 student.school.*
。
如果所有 student.school.*
列都是 NA 和 student.status == 1
,则保留它们 NA。
如果 student.status == 0
那么所有 student.school.*
列应该保持 NA
最终数据应如下所示:
student.status student.school.hs student.school.alt… student.school.…
<dbl> <dbl> <dbl> <lgl>
1 0 NA NA TRUE
2 1 1 0 FALSE
3 NA NA NA TRUE
4 0 NA NA TRUE
5 1 NA NA TRUE
6 1 0 1 FALSE
也许这有助于 - 循环 across
列 starts_with
列名称中的前缀 'student.school',同时从选择中删除逻辑列(-where(is.logical)
- 作为 student.school.allNA
也有相同的前缀但列类型不同),然后使用 case_when
更改列的值,当它是 NA
时,如果 student.school.allNA
为 FALSE(取反 (!
),连同 student.status
是 1)
library(dplyr)
d <- d %>%
mutate(across(c(starts_with('student.school'), - where(is.logical)),
~ case_when(student.status %in% 1 & !student.school.allNA & is.na(.x) ~ 0,
TRUE ~ .x)))
-输出
> d
# A tibble: 6 × 4
student.status student.school.hs student.school.alths student.school.allNA
<dbl> <dbl> <dbl> <lgl>
1 0 NA NA TRUE
2 1 1 0 FALSE
3 NA NA NA TRUE
4 0 NA NA TRUE
5 1 NA NA TRUE
6 1 0 1 FALSE
考虑这些数据:
library(dplyr)
d <- tibble(student.status = c(0, 1, NA, 0, 1, 1),
student.school.hs = c(NA, 1, NA, NA, NA, NA),
student.school.alths = c(NA, NA, NA, NA, NA, 1),
student.school.allNA = c(TRUE, FALSE, TRUE, TRUE, TRUE, FALSE))
student.status student.school.hs student.school.alt… student.school.…
<dbl> <dbl> <dbl> <lgl>
1 0 NA NA TRUE
2 1 1 NA FALSE
3 NA NA NA TRUE
4 0 NA NA TRUE
5 1 NA NA TRUE
6 1 NA 1 FALSE
我想在
student.status == 1
和所有student.school.*
列都不是 NA 时将“0”分配给student.school.*
。如果所有
student.school.*
列都是 NA 和student.status == 1
,则保留它们 NA。如果
student.status == 0
那么所有student.school.*
列应该保持 NA
最终数据应如下所示:
student.status student.school.hs student.school.alt… student.school.…
<dbl> <dbl> <dbl> <lgl>
1 0 NA NA TRUE
2 1 1 0 FALSE
3 NA NA NA TRUE
4 0 NA NA TRUE
5 1 NA NA TRUE
6 1 0 1 FALSE
也许这有助于 - 循环 across
列 starts_with
列名称中的前缀 'student.school',同时从选择中删除逻辑列(-where(is.logical)
- 作为 student.school.allNA
也有相同的前缀但列类型不同),然后使用 case_when
更改列的值,当它是 NA
时,如果 student.school.allNA
为 FALSE(取反 (!
),连同 student.status
是 1)
library(dplyr)
d <- d %>%
mutate(across(c(starts_with('student.school'), - where(is.logical)),
~ case_when(student.status %in% 1 & !student.school.allNA & is.na(.x) ~ 0,
TRUE ~ .x)))
-输出
> d
# A tibble: 6 × 4
student.status student.school.hs student.school.alths student.school.allNA
<dbl> <dbl> <dbl> <lgl>
1 0 NA NA TRUE
2 1 1 0 FALSE
3 NA NA NA TRUE
4 0 NA NA TRUE
5 1 NA NA TRUE
6 1 0 1 FALSE