如何在 R 中创建条件虚拟变量(面板数据)?
How to Create Conditional Dummy Variables (Panel Data) in R?
我的面板数据由两个波组成:18 和 21。我的就业状况有 4 个值。
我想创建一个假人,如果此人在两个 wave 中都受雇,则取值为 1,否则为 0。但是,我失败了,代码生成了一个只有零值的虚拟对象:
df$dummy <- df %>%
group_by(NEW_id) %>%
arrange(New_id, WAVE_NO) %>%
mutate(dummy = case_when(WAVE_NO==18 & WAVE_NO==21 & EMPLOYMENT_STATUS=="Employed" ~ 1, TRUE ~ 0))
我们可以使用 split
将数据帧拆分为 id
。作为 split
returns 列表,我们可以使用 lapply
对该列表的每个元素执行一些操作(此处:创建虚拟变量)。 lapply
的输出也将是一个列表。但是,我们想要 data.frame
,所以我们调用 do.call()
,它会立即对列表的所有元素执行一些操作(此处:rbind
)。
set.seed(1)
n <- 10L
K <- 2L
df <- data.frame(
id = rep(1L:n, each=K),
wave = rep(c(18L,21L), n),
employment = sample(c('Employed', 'Unemployed'), n*K, replace = TRUE)
)
# add dummy to data frame
df <- do.call(rbind, lapply(split(df, df$id), function(x) {
x$dummy <- ifelse(x$employment %in% 'Employed', 1L, 0L)
x$dummy <- ifelse(sum(x$dummy) == 2L, 1L, 0L)
return(x)
}))
rownames(df) <- NULL
输出
> head(df)
id wave employment dummy
1 1 18 Employed 0
2 1 21 Unemployed 0
3 2 18 Employed 1
4 2 21 Employed 1
5 3 18 Unemployed 0
6 3 21 Employed 0
df <- data.frame(
stringsAsFactors = FALSE,
id = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L),
wave = c(18L, 21L, 18L, 21L, 18L, 21L, 18L, 10L, 18L, 21L),
EMPLOYMENT_STATUS = c(
"Employed",
"Employed",
"unemployed",
"Employed",
"unemployed",
"Employed",
"Employed",
"Employed",
"unemployed",
"unemployed"
)
)
library(tidyverse)
df %>%
group_by(id) %>%
mutate(dummy = +(all(wave %in% c(18, 21)) &
all(EMPLOYMENT_STATUS == "Employed"))) %>%
ungroup()
#> # A tibble: 10 x 4
#> id wave EMPLOYMENT_STATUS dummy
#> <int> <int> <chr> <int>
#> 1 1 18 Employed 1
#> 2 1 21 Employed 1
#> 3 2 18 unemployed 0
#> 4 2 21 Employed 0
#> 5 3 18 unemployed 0
#> 6 3 21 Employed 0
#> 7 4 18 Employed 0
#> 8 4 10 Employed 0
#> 9 5 18 unemployed 0
#> 10 5 21 unemployed 0
由 reprex package (v2.0.1)
于 2022-01-23 创建
我的面板数据由两个波组成:18 和 21。我的就业状况有 4 个值。
我想创建一个假人,如果此人在两个 wave 中都受雇,则取值为 1,否则为 0。但是,我失败了,代码生成了一个只有零值的虚拟对象:
df$dummy <- df %>%
group_by(NEW_id) %>%
arrange(New_id, WAVE_NO) %>%
mutate(dummy = case_when(WAVE_NO==18 & WAVE_NO==21 & EMPLOYMENT_STATUS=="Employed" ~ 1, TRUE ~ 0))
我们可以使用 split
将数据帧拆分为 id
。作为 split
returns 列表,我们可以使用 lapply
对该列表的每个元素执行一些操作(此处:创建虚拟变量)。 lapply
的输出也将是一个列表。但是,我们想要 data.frame
,所以我们调用 do.call()
,它会立即对列表的所有元素执行一些操作(此处:rbind
)。
set.seed(1)
n <- 10L
K <- 2L
df <- data.frame(
id = rep(1L:n, each=K),
wave = rep(c(18L,21L), n),
employment = sample(c('Employed', 'Unemployed'), n*K, replace = TRUE)
)
# add dummy to data frame
df <- do.call(rbind, lapply(split(df, df$id), function(x) {
x$dummy <- ifelse(x$employment %in% 'Employed', 1L, 0L)
x$dummy <- ifelse(sum(x$dummy) == 2L, 1L, 0L)
return(x)
}))
rownames(df) <- NULL
输出
> head(df)
id wave employment dummy
1 1 18 Employed 0
2 1 21 Unemployed 0
3 2 18 Employed 1
4 2 21 Employed 1
5 3 18 Unemployed 0
6 3 21 Employed 0
df <- data.frame(
stringsAsFactors = FALSE,
id = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L),
wave = c(18L, 21L, 18L, 21L, 18L, 21L, 18L, 10L, 18L, 21L),
EMPLOYMENT_STATUS = c(
"Employed",
"Employed",
"unemployed",
"Employed",
"unemployed",
"Employed",
"Employed",
"Employed",
"unemployed",
"unemployed"
)
)
library(tidyverse)
df %>%
group_by(id) %>%
mutate(dummy = +(all(wave %in% c(18, 21)) &
all(EMPLOYMENT_STATUS == "Employed"))) %>%
ungroup()
#> # A tibble: 10 x 4
#> id wave EMPLOYMENT_STATUS dummy
#> <int> <int> <chr> <int>
#> 1 1 18 Employed 1
#> 2 1 21 Employed 1
#> 3 2 18 unemployed 0
#> 4 2 21 Employed 0
#> 5 3 18 unemployed 0
#> 6 3 21 Employed 0
#> 7 4 18 Employed 0
#> 8 4 10 Employed 0
#> 9 5 18 unemployed 0
#> 10 5 21 unemployed 0
由 reprex package (v2.0.1)
于 2022-01-23 创建