如何在 R 中创建条件虚拟变量(面板数据)?

How to Create Conditional Dummy Variables (Panel Data) in R?

我的面板数据由两个波组成:18 和 21。我的就业状况有 4 个值。

我想创建一个假人,如果此人在两个 wave 中都受雇,则取值为 1,否则为 0。但是,我失败了,代码生成了一个只有零值的虚拟对象:

df$dummy <- df %>%
  group_by(NEW_id) %>%
  arrange(New_id, WAVE_NO) %>%
  mutate(dummy = case_when(WAVE_NO==18 & WAVE_NO==21 & EMPLOYMENT_STATUS=="Employed" ~ 1, TRUE ~ 0))

我们可以使用 split 将数据帧拆分为 id。作为 split returns 列表,我们可以使用 lapply 对该列表的每个元素执行一些操作(此处:创建虚拟变量)。 lapply 的输出也将是一个列表。但是,我们想要 data.frame,所以我们调用 do.call(),它会立即对列表的所有元素执行一些操作(此处:rbind)。

set.seed(1)

n <- 10L
K <- 2L
df <- data.frame(
  id = rep(1L:n, each=K),
  wave = rep(c(18L,21L), n),
  employment = sample(c('Employed', 'Unemployed'), n*K, replace = TRUE)
)

# add dummy to data frame
df <- do.call(rbind, lapply(split(df, df$id), function(x) {
  x$dummy <- ifelse(x$employment %in% 'Employed', 1L, 0L)
  x$dummy <- ifelse(sum(x$dummy) == 2L, 1L, 0L)
  return(x)
}))
rownames(df) <- NULL

输出

> head(df)
  id wave employment dummy
1  1   18   Employed     0
2  1   21 Unemployed     0
3  2   18   Employed     1
4  2   21   Employed     1
5  3   18 Unemployed     0
6  3   21   Employed     0
df <- data.frame(
  stringsAsFactors = FALSE,
  id = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L),
  wave = c(18L, 21L, 18L, 21L, 18L, 21L, 18L, 10L, 18L, 21L),
  EMPLOYMENT_STATUS = c(
    "Employed",
    "Employed",
    "unemployed",
    "Employed",
    "unemployed",
    "Employed",
    "Employed",
    "Employed",
    "unemployed",
    "unemployed"
  )
)

library(tidyverse)
df %>%
  group_by(id) %>%
  mutate(dummy = +(all(wave %in% c(18, 21)) &
                     all(EMPLOYMENT_STATUS == "Employed"))) %>%
  ungroup()
#> # A tibble: 10 x 4
#>       id  wave EMPLOYMENT_STATUS dummy
#>    <int> <int> <chr>             <int>
#>  1     1    18 Employed              1
#>  2     1    21 Employed              1
#>  3     2    18 unemployed            0
#>  4     2    21 Employed              0
#>  5     3    18 unemployed            0
#>  6     3    21 Employed              0
#>  7     4    18 Employed              0
#>  8     4    10 Employed              0
#>  9     5    18 unemployed            0
#> 10     5    21 unemployed            0

reprex package (v2.0.1)

于 2022-01-23 创建