使用 case_when 创建组标识变量
Creating a group identifying variable with case_when
我有一个大家庭的个人数据集,所以有一个变量可以识别受访者与受访户主的关系(parent,child,兄弟,等)。
我想创建一个变量来标识他们的 "generation group"。
我的群组是:
gen0 <- c("grandparent", "grandparent_ofwife")
gen1 <- c("parent", "parent_inlaw", "parent_ofcohab")
gen2 <- c("head", "wife_legal", "wife_cohabit", "husband_legal", "y1_cohab")
gen3 <- c("child", "child_step", "child_ofwife", "child_inlaw", "child_foster", "child_1y_cohab")
我尝试使用 case_when 创建一个新的 "generation" 变量,代码如下:
dat2<- dat %>% mutate('2017_generation' = case_when('2017_relation_head' %in% gen0 ~ "gen0",
'2017_relation_head' %in% gen1 ~ "gen1",
'2017_relation_head' %in% gen2 ~ "gen2",
'2017_relation_head' %in% gen3 ~ "gen3"))
但是新变量“2017_generation”仍然完全用 NA 值填充。知道我做错了什么吗? (下面的示例数据)
id 2017_relation_head
1 wife_legal
2 head
3 wife_legal
4 head
5 wife_legal
6 head
7 wife_legal
8 child
9 child
10 NA
11 child
12 child
13 child
14 child
15 child
16 head
17 parent
18 NA
19 grandchild
20 child_step
这行得通。我认为主要问题是变量名周围的引号。但是,列的名称也不能以数字开头。
gen1 <- c("parent", "parent_inlaw", "parent_ofcohab")
gen2 <- c("head", "wife_legal", "wife_cohabit", "husband_legal", "y1_cohab")
gen3 <- c("child", "child_step", "child_ofwife", "child_inlaw", "child_foster", "child_1y_cohab")
library(dplyr)
dat <- data.frame("x2017_relation_head" = sample(c(gen0, gen1, gen2, gen3),
size = 100, replace = TRUE))
dat$x2017_relation_head <- as.character(dat$x2017_relation_head)
dat2<- dat %>% mutate(genx =
case_when(x2017_relation_head %in% gen0 ~ "gen0",
x2017_relation_head %in% gen1 ~ "gen1",
x2017_relation_head %in% gen2 ~ "gen2",
x2017_relation_head %in% gen3 ~ "gen3"))
head(dat2)
x2017_relation_head genx
1 child_1y_cohab gen3
2 child_inlaw gen3
3 child_step gen3
4 husband_legal gen2
5 child_step gen3
6 child_inlaw gen3
我有一个大家庭的个人数据集,所以有一个变量可以识别受访者与受访户主的关系(parent,child,兄弟,等)。
我想创建一个变量来标识他们的 "generation group"。 我的群组是:
gen0 <- c("grandparent", "grandparent_ofwife")
gen1 <- c("parent", "parent_inlaw", "parent_ofcohab")
gen2 <- c("head", "wife_legal", "wife_cohabit", "husband_legal", "y1_cohab")
gen3 <- c("child", "child_step", "child_ofwife", "child_inlaw", "child_foster", "child_1y_cohab")
我尝试使用 case_when 创建一个新的 "generation" 变量,代码如下:
dat2<- dat %>% mutate('2017_generation' = case_when('2017_relation_head' %in% gen0 ~ "gen0",
'2017_relation_head' %in% gen1 ~ "gen1",
'2017_relation_head' %in% gen2 ~ "gen2",
'2017_relation_head' %in% gen3 ~ "gen3"))
但是新变量“2017_generation”仍然完全用 NA 值填充。知道我做错了什么吗? (下面的示例数据)
id 2017_relation_head
1 wife_legal
2 head
3 wife_legal
4 head
5 wife_legal
6 head
7 wife_legal
8 child
9 child
10 NA
11 child
12 child
13 child
14 child
15 child
16 head
17 parent
18 NA
19 grandchild
20 child_step
这行得通。我认为主要问题是变量名周围的引号。但是,列的名称也不能以数字开头。
gen1 <- c("parent", "parent_inlaw", "parent_ofcohab")
gen2 <- c("head", "wife_legal", "wife_cohabit", "husband_legal", "y1_cohab")
gen3 <- c("child", "child_step", "child_ofwife", "child_inlaw", "child_foster", "child_1y_cohab")
library(dplyr)
dat <- data.frame("x2017_relation_head" = sample(c(gen0, gen1, gen2, gen3),
size = 100, replace = TRUE))
dat$x2017_relation_head <- as.character(dat$x2017_relation_head)
dat2<- dat %>% mutate(genx =
case_when(x2017_relation_head %in% gen0 ~ "gen0",
x2017_relation_head %in% gen1 ~ "gen1",
x2017_relation_head %in% gen2 ~ "gen2",
x2017_relation_head %in% gen3 ~ "gen3"))
head(dat2)
x2017_relation_head genx
1 child_1y_cohab gen3
2 child_inlaw gen3
3 child_step gen3
4 husband_legal gen2
5 child_step gen3
6 child_inlaw gen3