检查 data.frame 是否是另一个 data.frame 的子集

Check if data.frame is a subset of another data.frame

假设我有以下查找 table:

(lkp <- structure(list(a = c("a", "a", "a", "b", "c"),
                       b = c("a1 a2", "a3 a2", "a3", "a1", "a1")), 
                       row.names = c("lkp_1", "lkp_2", "lkp_3", "lkp_4", "lkp_5"), 
                       class = "data.frame"))
#       a     b
# lkp_1 a a1 a2
# lkp_2 a a3 a2
# lkp_3 a    a3
# lkp_4 b    a1
# lkp_5 c    a1 

我想检查另一个 data.framex 是否是 lkp 的子集,以及重要的附加要求,即对于列 b 匹配意味着 lkp$b 只需要 包含 x$b.

下面的例子应该能说明我的意思:

(chk <- list(c1 = structure(list(a = c("a", "a"), b = c("a2", "a2")), row.names = c(NA, -2L), class = "data.frame"), 
             c2 = structure(list(a = "b", b = "a1"), row.names = c(NA, -1L), class = "data.frame"), 
             c3 = structure(list(a = c("a", "a"), b = c("a1", "a1")), row.names = c(NA, -2L), class = "data.frame"), 
             c4 = structure(list(a = c("a", "a"), b = c("a3", "a2")), row.names = c(NA, -2L), class = "data.frame")))

# $c1
#   a  b
# 1 a a2
# 2 a a2

# $c2
#   a  b
# 1 b a1

# $c3
#   a  b
# 1 a a1
# 2 a a1

# $c4
#   a  b
# 1 a a3
# 2 a a2

原则上我正在寻找合并(或连接),其中连接条件将使用某种模糊匹配。

我找到并阅读了这两个 SO 答案:

尤其是第二个答案看起来很有希望。但是,我不需要近似匹配,而是某种 does_contain 关系而不是纯粹的相等。那么也许 regex 解决方案可行?

预期结果

magic_is_subset_function <- function(chk, lkp) {
   # ...
}
sapply(chk, magic_is_subset_function, lkp = lkp)
# [1] TRUE TRUE FALSE TRUE
sapply(
    chk,
    function(v) {
        sum(
            rowSums(sapply(v$a, `==`, lkp$a) &
                sapply(v$b, grepl, x = lkp$b)) > 0
        ) >= nrow(v)
    }
)

sapply(
    chk,
    function(v) {
        sum(
            colSums(
                do.call(
                    `&`,
                    Map(
                        function(x, y) outer(x, y, FUN = Vectorize(function(a, b) grepl(a, b))),
                        v,
                        lkp
                    )
                )
            ) > 0
        ) >= nrow(v)
    }
)

这给出了

   c1    c2    c3    c4 
 TRUE  TRUE FALSE FALSE