如何检查 R Dataframe 中特定列的每个值是否存在多个字符串？

Question

我们如何识别特定列中包含一组特定关键字的所有行条目？

例如，我有以下数据框：

test <- data.frame(nom = 1:5, name = c("ser bla", "onlybla", "inspectiongfa serdafds", "inspection", "serbla blainspection"))

我感兴趣的关键字是 "ser" & "inspection"

我正在寻找的是获取第二列的所有值（即 name），其中两个关键字一起出现。

所以基本上，我的输出应该包含第 3 行和第 4 行的 name 值，即。 "inspectiongfa serdafds" & "serbla blainspection"

我试过的是：

我首先生成一个真值 table 以获取数据框中每一行的每个关键字的存在，如下所示：

as.data.frame(sapply(c("ser", "inspection"), grepl, test$name))

一旦我得到这个，我所要做的就是识别所有那些值为 TRUE TRUE 的行条目。因此，它们将对应于存在感兴趣关键字的情况。这是相同的第 3 行和第 4 行。

但是，我无法弄清楚如何使用 TRUE TRUE 对来识别此类行条目，以及整个过程是否有点矫枉过正，并且可以以更有效的方式完成。

如有任何帮助，我们将不胜感激。谢谢！

Answer 1

你快到了:)

这是一个扩展您所做的解决方案：

# store your logic test outcomes
conditions_df <- as.data.frame(sapply(c("ser", "inspection"), grepl, test$name))

# False=0 & True=1. Can use rowSums to get the total and find ones that =2 ie True+True
# which gives you the indices of the TRUE outcomes ie the rows we need to filter test
locate_rows <- which(rowSums(conditions_df) == 2)
test$name[locate_rows]
[1] "inspectiongfa serdafds"
[2] "serbla blainspection"

如何检查 R Dataframe 中特定列的每个值是否存在多个字符串？

How to check for the presence of multiple strings for each value of a particular column in R Dataframe?

string

boolean-logic

boolean

r

dataframe