如何根据其他变量的值为每个人创建二进制变量？

Question

所以我有一个包含 4 个人的数据集。每个人在不同的时间段进行测量。在 R:

df = data.frame(cbind("id"=c(1,1,1,2,2,3,3,3,3,4,4), "t"=c(1,2,3,1,2,1,2,3,4,1,2), "x1"=c(0,1,0,1,0,0,1,0,1,0,0)))

并且我想创建变量 x2，指示给定个体的变量 x1 中是否已经有 1，即它看起来像这样：

"x2" = c(0,1,1,1,1,0,1,1,1,0,0)

... 最好使用 dplyr 包。到目前为止我来到这里：

new_df = df %>% dplyr::group_by(id) %>% dplyr::arrange(t)

但是不能从这个点移动...想要的结果在图片上。

Answer 1

这是一种使用 dplyr 的方法：

df %>% 
  arrange(id, t) %>%
  group_by(id) %>% 
  mutate(x2 = ifelse(row_number() >= min(row_number()[x1 == 1]), 1, 0))

如果行号大于或等于 x1 为 1 的第一个行号，这将添加 1；否则，它将添加一个 0.

请注意，您会收到警告，因为至少有一组 x1 的值不等于 1。

此外，另一种选择，包括如果您想要 NA，其中没有 id 的 x1 值为 1（例如，id 为 4）：

df %>% 
  arrange(id, t) %>%
  group_by(id) %>% 
  mutate(x2 = +(row_number() >= which(x1 == 1)[1]))

输出

      id     t    x1    x2
   <dbl> <dbl> <dbl> <dbl>
 1     1     1     0     0
 2     1     2     1     1
 3     1     3     0     1
 4     2     1     1     1
 5     2     2     0     1
 6     3     1     0     0
 7     3     2     1     1
 8     3     3     0     1
 9     3     4     1     1
10     4     1     0     0
11     4     2     0     0

如何根据其他变量的值为每个人创建二进制变量？

How to create binary variable for each individual based on value in other variable?

r

dplyr

tidyverse

feature-engineering