在 R 中使用 grep 和子集的多重过滤器

Question

我正在尝试创建一个过滤器，以同时使用 grep 和子集从数据集中删除行。

示例数据集：

id <- 1:10
problem <- c("a" , "b", "c", "d", "a","b","c","a", "b", "a")
solution1 <- c("eat", "sleep", "drink", "play", "sleep", "play", "play", "drink", "play", "eat")
solution2 <- c("read", "read", "eat", "drink", "eat", "sleep", "eat", "read", "eat", "play")
df <- c(id, problem, solution1, solution2)

我正在尝试删除那些有问题 "a" 并且在解决方案 1 或解决方案 2 中有 "eat" 的行。

结果是它应该删除 id 1、5 和 10。

我试过使用：

df <- subset(df, problem=="a" & !(grepl("eat", df)))

和

df <- df[!grepl("eat", df) & grepl("a", df$problem)]

似乎无法在 Whosebug 或我用 Google 搜索过的其他网站上找到类似的解决方案。

如果有人能提供帮助，我们将不胜感激。谢谢！

Answer 1

首先，如果你想要一个dataframe，你应该使用data.frame，而不是c:

df <- data.frame(id, problem, solution1, solution2)

然后你可以像这样子集（不需要使用子集本身）

df2 <- df[!(grepl("a", df$problem) & 
           (grepl("eat", df$solution1) |
            grepl("eat", solution2))),]

#   id problem solution1 solution2
# 2  2       b     sleep      read
# 3  3       c     drink       eat
# 4  4       d      play     drink
# 6  6       b      play     sleep
# 7  7       c      play       eat
# 8  8       a     drink      read
# 9  9       b      play       eat

Answer 2

我会这样做：

df <- df[!(df$problem %in% "a" & (df$solution1 %in% "eat" | df$solution2 %in% "eat")),]

#   id problem solution1 solution2
# 2  2       b     sleep      read
# 3  3       c     drink       eat
# 4  4       d      play     drink
# 6  6       b      play     sleep
# 7  7       c      play       eat
# 8  8       a     drink      read
# 9  9       b      play       eat

如果您比较确切的字符串，那么这里并不是真正需要正则表达式。使用 %in% 进行子集化会为您节省很多时间，因为它会比较向量。例如而不是 "a" 可能会有 c("a", "b", "c") 等等

在 R 中使用 grep 和子集的多重过滤器

Multiple filter using grep and subset in R

r

subset