检查值是否存在于其他参考数据框中

Check if values exist in other reference dataframes

我有下面的玩具数据集,它代表了一个更大的数据。但是,这些是重要的列。我正在尝试检查 Dataframe 中的值是否与参考数据帧 Reference_AReference_BReference_C.

匹配
DataFrame

group   type    value
x       A       Teddy
x       A       William
x       A       Lars
y       B       Robert
y       B       Elsie
y       C       Maeve
y       C       Charlotte
y       C       Bernard


Reference_A

type    value
A       Teddy
A       William
A       Lars

Reference_B

type    value
B       Elsie
B       Dolores

Reference_C

type    value
C       Maeve
C       Hale
C       Bernard

期望的输出:

group   type    value      check
x       A       Teddy      TRUE
x       A       William    TRUE
x       A       Lars       TRUE
y       B       Robert     FALSE
y       B       Elsie      TRUE
y       C       Maeve      TRUE
y       C       Charlotte  FALSE
y       C       Bernard    TRUE

我在这里发布了一个类似的问题,但意识到 TRUEFALSE 可能更有效地检查:。我认为顺序无关紧要,因为我可以操纵我的数据,使所有值都是唯一的。

您可以将“参考”数据帧合并为一个数据帧,然后通过 type 将其与 DataFrame 连接,对于每个 typevalue,您可以检查是否any value 匹配。

library(dplyr)

mget(paste0('Reference_', c('A', 'B', 'C'))) %>%
   bind_rows() %>%
   right_join(DataFrame, by = 'type') %>%
   group_by(group, type, value = value.y) %>%
   summarise(check = any(value.x == value.y))


#  group type  value     check
#  <chr> <chr> <chr>     <lgl>
#1 x     A     Lars      TRUE 
#2 x     A     Teddy     TRUE 
#3 x     A     William   TRUE 
#4 y     B     Elsie     TRUE 
#5 y     B     Robert    FALSE
#6 y     C     Bernard   TRUE 
#7 y     C     Charlotte FALSE
#8 y     C     Maeve     TRUE 

数据

Reference_A <- structure(list(type = c("A", "A", "A"), 
value = c("Teddy", "William", "Lars")), class = "data.frame", 
row.names = c(NA, -3L))

Reference_B <- structure(list(type = c("B", "B"), value = c("Elsie", "Dolores")), 
class = "data.frame", row.names = c(NA, -2L))

Reference_C <- structure(list(type = c("C", "C", "C"), value = c("Maeve", "Hale", 
"Bernard")), class = "data.frame", row.names = c(NA, -3L))

DataFrame <- structure(list(group = c("x", "x", "x", "y", "y", "y", "y", "y"), 
type = c("A", "A", "A", "B", "B", "C", "C", "C"), value = c("Teddy", 
"William", "Lars", "Robert", "Elsie", "Maeve", "Charlotte", "Bernard"
)), class = "data.frame", row.names = c(NA, -8L))