如何将 anti_join 与两个变量的不同级别一起使用?
How to use anti_join with different levels of two variables?
我已经尝试了几个小时,但我无法弄清楚。我有一个包含主题和条件 df1
的数据框,我想从中排除具有特定值的观察值(来自 df2
的变量“值”中小于 3)。我无法使其工作,因为我需要从 df1
中删除两个变量的不同级别的组合。
这是 df1:
df1 <- structure(list(subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
condition = c("A", "A", "A", "B", "B", "B", "C", "C","C", "A", "A",
"A", "B", "B", "B", "C", "C", "C", "A", "A", "A","B", "B", "B", "C", "C", "C")),
row.names = c(NA, -27L), class = c("tbl_df", "tbl", "data.frame"))
这是 df2
df2 <- structure(list(subject = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,4L, 4L, 4L, 5L, 5L, 5L),
condition = c("A", "B", "C", "A", "B","C", "A", "B", "C", "A", "B", "C", "A", "B", "C"),
value = c(10L, 8L, 7L, 3L, 8L, 5L, 3L, 3L, 9L, 8L, 7L, 8L, 10L, 6L, 2L)),
row.names = c(NA,-15L), class = c("tbl_df", "tbl", "data.frame"))
而且我想在 df1
中删除所有值小于 3 的主题和条件的组合,所以这将是最终的 df:
df3 <- structure(list(subject = c(2L, 3L, 3L, 5L),
condition = c("A","A", "B", "C")),
row.names = c(NA, -4L),
class = c("tbl_df","tbl", "data.frame"))
到目前为止我一直这样做,但我不能再这样做了,因为我有数百行...
df3 <- df1 %>% filter(!(subject==2 & condition=="A" |
subject==3 & (condition=="A" | condition=="B") |
subject==5 & condition=="C"))
您 df3
的示例结果与您用来派生它的代码冲突,因此这里有一个 dplyr
解决方案,用于对 df3
的每种解释。
注意:这两种结果只有当你
...exclude observations which have a certain value (less than [or equal to] 3 in the variable "value" from df2.
所以我使用不等式 <= 3
而不是 < 3
来实现这些解决方案。
df3
的第 1 次解读
获取df3
的版本
# A tibble: 4 x 2
subject condition
<int> <chr>
1 2 A
2 3 A
3 3 B
4 5 C
您在此处提供的示例结果
And I want to remove in df1 all the combinations of subject and condition with a value under 3 so this would be the final df:
df3 <- structure(list(subject = c(2L, 3L, 3L, 5L),
condition = c("A","A", "B", "C")),
row.names = c(NA, -4L),
class = c("tbl_df","tbl", "data.frame"))
只需在 df2
上使用 filter()
:
library(dplyr)
# ...
# Code to generate 'df1' and 'df2'.
# ...
df3 <- df2 %>% filter(value <= 3)
df3
的二次解读
不过,看来您实际上想要以下版本的df3
# A tibble: 18 x 2
subject condition
<int> <chr>
1 1 A
2 1 A
3 1 A
4 1 B
5 1 B
6 1 B
7 1 C
8 1 C
9 1 C
10 2 B
11 2 B
12 2 B
13 2 C
14 2 C
15 2 C
16 3 C
17 3 C
18 3 C
你在这里得到的:
df3 <- df1 %>% filter(!(subject==2 & condition=="A" |
subject==3 & (condition=="A" |condition=="B") |
subject==5 & condition=="C"))
在 的情况下,你应该 anti_join()
你的 df1
到 df2
的 filter()
ed 版本:
library(dplyr)
# ...
# Code to generate 'df1' and 'df2'.
# ...
df3 <- df1 %>%
anti_join(df2 %>% filter(value <= 3), by = c("subject", "condition"))
我已经尝试了几个小时,但我无法弄清楚。我有一个包含主题和条件 df1
的数据框,我想从中排除具有特定值的观察值(来自 df2
的变量“值”中小于 3)。我无法使其工作,因为我需要从 df1
中删除两个变量的不同级别的组合。
这是 df1:
df1 <- structure(list(subject = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
condition = c("A", "A", "A", "B", "B", "B", "C", "C","C", "A", "A",
"A", "B", "B", "B", "C", "C", "C", "A", "A", "A","B", "B", "B", "C", "C", "C")),
row.names = c(NA, -27L), class = c("tbl_df", "tbl", "data.frame"))
这是 df2
df2 <- structure(list(subject = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,4L, 4L, 4L, 5L, 5L, 5L),
condition = c("A", "B", "C", "A", "B","C", "A", "B", "C", "A", "B", "C", "A", "B", "C"),
value = c(10L, 8L, 7L, 3L, 8L, 5L, 3L, 3L, 9L, 8L, 7L, 8L, 10L, 6L, 2L)),
row.names = c(NA,-15L), class = c("tbl_df", "tbl", "data.frame"))
而且我想在 df1
中删除所有值小于 3 的主题和条件的组合,所以这将是最终的 df:
df3 <- structure(list(subject = c(2L, 3L, 3L, 5L),
condition = c("A","A", "B", "C")),
row.names = c(NA, -4L),
class = c("tbl_df","tbl", "data.frame"))
到目前为止我一直这样做,但我不能再这样做了,因为我有数百行...
df3 <- df1 %>% filter(!(subject==2 & condition=="A" |
subject==3 & (condition=="A" | condition=="B") |
subject==5 & condition=="C"))
您 df3
的示例结果与您用来派生它的代码冲突,因此这里有一个 dplyr
解决方案,用于对 df3
的每种解释。
注意:这两种结果只有当你
...exclude observations which have a certain value (less than [or equal to] 3 in the variable "value" from df2.
所以我使用不等式 <= 3
而不是 < 3
来实现这些解决方案。
df3
的第 1 次解读
获取df3
的版本
# A tibble: 4 x 2
subject condition
<int> <chr>
1 2 A
2 3 A
3 3 B
4 5 C
您在此处提供的示例结果
And I want to remove in df1 all the combinations of subject and condition with a value under 3 so this would be the final df:
df3 <- structure(list(subject = c(2L, 3L, 3L, 5L), condition = c("A","A", "B", "C")), row.names = c(NA, -4L), class = c("tbl_df","tbl", "data.frame"))
只需在 df2
上使用 filter()
:
library(dplyr)
# ...
# Code to generate 'df1' and 'df2'.
# ...
df3 <- df2 %>% filter(value <= 3)
df3
的二次解读
不过,看来您实际上想要以下版本的df3
# A tibble: 18 x 2
subject condition
<int> <chr>
1 1 A
2 1 A
3 1 A
4 1 B
5 1 B
6 1 B
7 1 C
8 1 C
9 1 C
10 2 B
11 2 B
12 2 B
13 2 C
14 2 C
15 2 C
16 3 C
17 3 C
18 3 C
你在这里得到的:
df3 <- df1 %>% filter(!(subject==2 & condition=="A" |
subject==3 & (condition=="A" |condition=="B") |
subject==5 & condition=="C"))
在 的情况下,你应该 anti_join()
你的 df1
到 df2
的 filter()
ed 版本:
library(dplyr)
# ...
# Code to generate 'df1' and 'df2'.
# ...
df3 <- df1 %>%
anti_join(df2 %>% filter(value <= 3), by = c("subject", "condition"))