如何匹配 data.frame 列中的布尔值序列？

Question

我需要匹配列中特定的布尔值序列，例如下面的 'sample'-vector。在这种情况下，我想知道该列中是否出现了任何模式。

但是，我的代码似乎忽略了样本向量中的显式模式，而是 returns 如果它可以从列中的样本向量中找到任何项目，则为真。在这种情况下，只有列 test1 的计算结果应为 True。

如何正确匹配模式？

sample <- rep(TRUE, 3)
df <- data.frame(test1 = rep(c(TRUE, FALSE, TRUE, TRUE), 3),
                 test2 = rep(c(TRUE, FALSE, FALSE, TRUE), 3),
                 test3 = rep(c(TRUE, FALSE, FALSE, FALSE), 3))

> lapply(1:ncol(df), FUN=function(i){any(sample %in% df[,i])})
[[1]]
[1] TRUE

[[2]]
[1] TRUE

[[3]]
[1] TRUE

Answer 1

你可以试试，

which(rowSums(mapply(function(x, y) x == y, sample, df)) == ncol(df))

Answer 2

stringifiedSample <- paste(sample, collapse = " ")

lapply(df, function(col) {
  grepl(stringifiedSample, paste(col, collapse = " "), fixed = TRUE)
})

$test1
[1] TRUE

$test2
[1] FALSE

$test2.1
[1] FALSE

Answer 3

这是另一个解决方案。

我必须承认这个函数不是我的，我很久以前在某个地方看到过，并没有写在我找到它的.R文件中，所以我不能给学分。

但它会按照要求进行操作，找到向量的给定子序列。重新创建数据是因为在问题中 data.frame 有两个 test2 并且 sample 是一个基本函数。我两个都改了。

occurs <- function(x, y) {
  m <- length(x)
  n <- length(y)
  candidate <- seq.int(length = n - m + 1L)
  for (i in seq.int(length = m)) {
    candidate <- candidate[x[i] == y[candidate + i - 1L]]
  }
  candidate
}

x <- rep(TRUE, 3)
df1 <- data.frame(test1 = rep(c(TRUE, FALSE, TRUE, TRUE), 3),
                  test2 = rep(c(TRUE, FALSE, FALSE, TRUE), 3),
                  test3 = rep(c(TRUE, FALSE, FALSE, FALSE), 3))

occurs(x, df1$test1)
#[1] 3 7

lapply(df1, \(y) occurs(x, y))
#$test1
#[1] 3 7
#
#$test2
#integer(0)
#
#$test3
#integer(0)

该序列在 test1 位置 3 和 7 中出现两次，而在其他列中不存在。

编辑

occurs 的运行只有第一 df1 列，重命名为 y。
该算法是将 window 移动到 y，并将其匹配到 x.
中的序列通过循环的每次迭代重新分配 candidate 将不断缩短 candidate 向量。最后，只有 y 的元素序列与 x 的所有元素相匹配，仍然会在 candidate.

中

x <- rep(TRUE, 3)
y <- rep(c(TRUE, FALSE, TRUE, TRUE), 3)

m <- length(x)
n <- length(y)
(candidate <- seq.int(length = n - m + 1L))
# [1]  1  2  3  4  5  6  7  8  9 10

i <- 1L
which(x[i] == y[candidate + i - 1L]) 
which(x[1] == y[c(1,2,3,4,5,6,7,8,9,10)]
which(TRUE == c(TRUE,FALSE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,FALSE))
#[1] 1 3 4 5 7 8 9

# remove 2nd, 6th and 10th from candidate
(candidate <- candidate[x[i] == y[candidate + i - 1L]])
#[1] 1 3 4 5 7 8 9

i <- 2L
which(x[i] == y[candidate + i - 1L])
which(x[2] == y[c(2,4,5,6,8,9,10)]
which(TRUE == c(FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE))
#[1] 2 3 5 6

# shorten to 2nd, 3rd, 5th and 6th of candidate
(candidate <- candidate[x[i] == y[candidate + i - 1L]])
#[1] 3 4 7 8

i <- 3L
which(x[i] == y[candidate + i - 1L])
which(x[3] == y[c(5,6,9,10)]
which(TRUE == c(TRUE,FALSE,TRUE,FALSE))
#[1] 1 3

# shorten to 1st and 3rd of candidate
(candidate <- candidate[x[i] == y[candidate + i - 1L]])
#[1] 3 7

如何匹配 data.frame 列中的布尔值序列？

How to match sequences of boolean values in a data.frame column?

boolean

r

pattern-matching

编辑