如果我的数据中有两个组与第二个分类变量部分匹配，有没有办法删除不匹配的组？

Question

例如，我有两个数据集 (A,B)，它们有一个分数列、一个位置列（英格兰或威尔士）和一个月份列。如果数据集 A 只有一月到十月，而数据集 B 只有四月到十一月，是否可以过滤我的数据以仅包含四月到十月？这用于统计测试中的配对数据。

我的实际数据集有一百多个分类变量，可能有一半在组之间不匹配，所以手动执行此操作至少效率不高。

Answer 1

这个可重现的例子是否抓住了你想做的事情？

library(tidyverse)

dfa <- tribble(~location, ~month, ~a_score,
        "England", 1, 1,
        "England", 2, 1,
        "England", 3, 1,
        "Wales", 1, 1,
        "Wales", 2, 1,
        "Wales", 3, 1
        )

dfb <- tribble(~location, ~month, ~b_score,
        "England", 2, 2,
        "England", 3, 2,
        "England", 4, 2,
        "Wales", 2, 2,
        "Wales", 3, 2,
        "Wales", 4, 2
)

dfa |> inner_join(dfb, by = c("location", "month"))
#> # A tibble: 4 × 4
#>   location month a_score b_score
#>   <chr>    <dbl>   <dbl>   <dbl>
#> 1 England      2       1       2
#> 2 England      3       1       2
#> 3 Wales        2       1       2
#> 4 Wales        3       1       2

^{由 reprex package (v2.0.1)}

创建于 2022-05-16

如果我的数据中有两个组与第二个分类变量部分匹配，有没有办法删除不匹配的组？

If I have two groups in my data which partially match for a second categorical variable, is there a way to remove non-matching groups?

r

dplyr