在对一个变量 R 编码响应后删除重复项

Question

总体目标：在对一个变量编码响应后删除重复项

我有一个数据集，其中个人可以 select 变量的多个选项 [例如国家代码]。我想删除重复项，但首先我想确保我捕获了所有信息。

例如国家代码，其中选项 1-5 是欧洲而 6-10 是非洲 - 我创建了编码欧洲的变量 = 1 如果国家代码 =1:5 和 0 如果 6:10 （反之亦然）非洲）。

ID	Country Code	Europe	Africa
1	1	1	0
1	4	1	0
1	10	0	1
2	3	1	0
2	10	0	1
3	7	0	1

但我想看看他们是否都回答了1，然后删除重复的ID。因此，我想创建一个如下所示的数据集：

ID	Europe	Africa	Both
1	1	1	1
2	1	1	1
3	0	1	0

我试过这个：

aggregate(x=Europe,by=list(ID),FUN=sum)) 创建一个 sumEurope（后跟 sumAfrica）然后说“if sumEurope*sumAfica >0 = 1” 但是，上面的代码有一个错误，即“Assigned data..must be compatible with existing data”。因为现有行多于分配的行。

Answer 1

我们可以将 summarise 与 max across Europe 和 Africa 一起使用，然后是 ifelse 语句：

library(dplyr)

df %>% 
  group_by(ID) %>% 
  summarise(across(c(Europe, Africa), max)) %>% 
  mutate(Both = ifelse(Africa == 1 & Europe == 1, 1, 0))

     ID Europe Africa  Both
  <int>  <int>  <int> <dbl>
1     1      1      1     1
2     2      1      1     1
3     3      0      1     0

数据：

structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 3L), CountryCode = c(1L, 
4L, 10L, 3L, 10L, 7L), Europe = c(1L, 1L, 0L, 1L, 0L, 0L), Africa = c(0L, 
0L, 1L, 0L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-6L))

Answer 2

在 BASE R 中：

transform(+aggregate(cbind(Europe, Africa)~ID, df, any), Both = Europe * Africa)
  ID Europe Africa Both
1  1      1      1    1
2  2      1      1    1
3  3      0      1    0

如果您不介意逻辑值：

transform(aggregate(cbind(Europe, Africa)~ID, df, any), Both = Europe * Africa)
  ID Europe Africa Both
1  1   TRUE   TRUE    1
2  2   TRUE   TRUE    1
3  3  FALSE   TRUE    0

在对一个变量 R 编码响应后删除重复项

Delete duplicates after coding response on one variable R

r

duplicates