检查值是否存在于其他参考数据框中
Check if values exist in other reference dataframes
我有下面的玩具数据集,它代表了一个更大的数据。但是,这些是重要的列。我正在尝试检查 Dataframe
中的值是否与参考数据帧 Reference_A
、Reference_B
和 Reference_C
.
匹配
DataFrame
group type value
x A Teddy
x A William
x A Lars
y B Robert
y B Elsie
y C Maeve
y C Charlotte
y C Bernard
Reference_A
type value
A Teddy
A William
A Lars
Reference_B
type value
B Elsie
B Dolores
Reference_C
type value
C Maeve
C Hale
C Bernard
期望的输出:
group type value check
x A Teddy TRUE
x A William TRUE
x A Lars TRUE
y B Robert FALSE
y B Elsie TRUE
y C Maeve TRUE
y C Charlotte FALSE
y C Bernard TRUE
我在这里发布了一个类似的问题,但意识到 TRUE
和 FALSE
可能更有效地检查:。我认为顺序无关紧要,因为我可以操纵我的数据,使所有值都是唯一的。
您可以将“参考”数据帧合并为一个数据帧,然后通过 type
将其与 DataFrame
连接,对于每个 type
和 value
,您可以检查是否any
value
匹配。
library(dplyr)
mget(paste0('Reference_', c('A', 'B', 'C'))) %>%
bind_rows() %>%
right_join(DataFrame, by = 'type') %>%
group_by(group, type, value = value.y) %>%
summarise(check = any(value.x == value.y))
# group type value check
# <chr> <chr> <chr> <lgl>
#1 x A Lars TRUE
#2 x A Teddy TRUE
#3 x A William TRUE
#4 y B Elsie TRUE
#5 y B Robert FALSE
#6 y C Bernard TRUE
#7 y C Charlotte FALSE
#8 y C Maeve TRUE
数据
Reference_A <- structure(list(type = c("A", "A", "A"),
value = c("Teddy", "William", "Lars")), class = "data.frame",
row.names = c(NA, -3L))
Reference_B <- structure(list(type = c("B", "B"), value = c("Elsie", "Dolores")),
class = "data.frame", row.names = c(NA, -2L))
Reference_C <- structure(list(type = c("C", "C", "C"), value = c("Maeve", "Hale",
"Bernard")), class = "data.frame", row.names = c(NA, -3L))
DataFrame <- structure(list(group = c("x", "x", "x", "y", "y", "y", "y", "y"),
type = c("A", "A", "A", "B", "B", "C", "C", "C"), value = c("Teddy",
"William", "Lars", "Robert", "Elsie", "Maeve", "Charlotte", "Bernard"
)), class = "data.frame", row.names = c(NA, -8L))
我有下面的玩具数据集,它代表了一个更大的数据。但是,这些是重要的列。我正在尝试检查 Dataframe
中的值是否与参考数据帧 Reference_A
、Reference_B
和 Reference_C
.
DataFrame
group type value
x A Teddy
x A William
x A Lars
y B Robert
y B Elsie
y C Maeve
y C Charlotte
y C Bernard
Reference_A
type value
A Teddy
A William
A Lars
Reference_B
type value
B Elsie
B Dolores
Reference_C
type value
C Maeve
C Hale
C Bernard
期望的输出:
group type value check
x A Teddy TRUE
x A William TRUE
x A Lars TRUE
y B Robert FALSE
y B Elsie TRUE
y C Maeve TRUE
y C Charlotte FALSE
y C Bernard TRUE
我在这里发布了一个类似的问题,但意识到 TRUE
和 FALSE
可能更有效地检查:
您可以将“参考”数据帧合并为一个数据帧,然后通过 type
将其与 DataFrame
连接,对于每个 type
和 value
,您可以检查是否any
value
匹配。
library(dplyr)
mget(paste0('Reference_', c('A', 'B', 'C'))) %>%
bind_rows() %>%
right_join(DataFrame, by = 'type') %>%
group_by(group, type, value = value.y) %>%
summarise(check = any(value.x == value.y))
# group type value check
# <chr> <chr> <chr> <lgl>
#1 x A Lars TRUE
#2 x A Teddy TRUE
#3 x A William TRUE
#4 y B Elsie TRUE
#5 y B Robert FALSE
#6 y C Bernard TRUE
#7 y C Charlotte FALSE
#8 y C Maeve TRUE
数据
Reference_A <- structure(list(type = c("A", "A", "A"),
value = c("Teddy", "William", "Lars")), class = "data.frame",
row.names = c(NA, -3L))
Reference_B <- structure(list(type = c("B", "B"), value = c("Elsie", "Dolores")),
class = "data.frame", row.names = c(NA, -2L))
Reference_C <- structure(list(type = c("C", "C", "C"), value = c("Maeve", "Hale",
"Bernard")), class = "data.frame", row.names = c(NA, -3L))
DataFrame <- structure(list(group = c("x", "x", "x", "y", "y", "y", "y", "y"),
type = c("A", "A", "A", "B", "B", "C", "C", "C"), value = c("Teddy",
"William", "Lars", "Robert", "Elsie", "Maeve", "Charlotte", "Bernard"
)), class = "data.frame", row.names = c(NA, -8L))