在 R 中使用 grep 和子集的多重过滤器
Multiple filter using grep and subset in R
我正在尝试创建一个过滤器,以同时使用 grep 和子集从数据集中删除行。
示例数据集:
id <- 1:10
problem <- c("a" , "b", "c", "d", "a","b","c","a", "b", "a")
solution1 <- c("eat", "sleep", "drink", "play", "sleep", "play", "play", "drink", "play", "eat")
solution2 <- c("read", "read", "eat", "drink", "eat", "sleep", "eat", "read", "eat", "play")
df <- c(id, problem, solution1, solution2)
我正在尝试删除那些有问题 "a" 并且在解决方案 1 或解决方案 2 中有 "eat" 的行。
结果是它应该删除 id 1、5 和 10。
我试过使用:
df <- subset(df, problem=="a" & !(grepl("eat", df)))
和
df <- df[!grepl("eat", df) & grepl("a", df$problem)]
似乎无法在 Whosebug 或我用 Google 搜索过的其他网站上找到类似的解决方案。
如果有人能提供帮助,我们将不胜感激。谢谢!
首先,如果你想要一个dataframe,你应该使用data.frame,而不是c:
df <- data.frame(id, problem, solution1, solution2)
然后你可以像这样子集(不需要使用子集本身)
df2 <- df[!(grepl("a", df$problem) &
(grepl("eat", df$solution1) |
grepl("eat", solution2))),]
# id problem solution1 solution2
# 2 2 b sleep read
# 3 3 c drink eat
# 4 4 d play drink
# 6 6 b play sleep
# 7 7 c play eat
# 8 8 a drink read
# 9 9 b play eat
我会这样做:
df <- df[!(df$problem %in% "a" & (df$solution1 %in% "eat" | df$solution2 %in% "eat")),]
# id problem solution1 solution2
# 2 2 b sleep read
# 3 3 c drink eat
# 4 4 d play drink
# 6 6 b play sleep
# 7 7 c play eat
# 8 8 a drink read
# 9 9 b play eat
如果您比较确切的字符串,那么这里并不是真正需要正则表达式。使用 %in%
进行子集化会为您节省很多时间,因为它会比较向量。例如而不是 "a"
可能会有 c("a", "b", "c")
等等
我正在尝试创建一个过滤器,以同时使用 grep 和子集从数据集中删除行。
示例数据集:
id <- 1:10
problem <- c("a" , "b", "c", "d", "a","b","c","a", "b", "a")
solution1 <- c("eat", "sleep", "drink", "play", "sleep", "play", "play", "drink", "play", "eat")
solution2 <- c("read", "read", "eat", "drink", "eat", "sleep", "eat", "read", "eat", "play")
df <- c(id, problem, solution1, solution2)
我正在尝试删除那些有问题 "a" 并且在解决方案 1 或解决方案 2 中有 "eat" 的行。
结果是它应该删除 id 1、5 和 10。
我试过使用:
df <- subset(df, problem=="a" & !(grepl("eat", df)))
和
df <- df[!grepl("eat", df) & grepl("a", df$problem)]
似乎无法在 Whosebug 或我用 Google 搜索过的其他网站上找到类似的解决方案。
如果有人能提供帮助,我们将不胜感激。谢谢!
首先,如果你想要一个dataframe,你应该使用data.frame,而不是c:
df <- data.frame(id, problem, solution1, solution2)
然后你可以像这样子集(不需要使用子集本身)
df2 <- df[!(grepl("a", df$problem) &
(grepl("eat", df$solution1) |
grepl("eat", solution2))),]
# id problem solution1 solution2
# 2 2 b sleep read
# 3 3 c drink eat
# 4 4 d play drink
# 6 6 b play sleep
# 7 7 c play eat
# 8 8 a drink read
# 9 9 b play eat
我会这样做:
df <- df[!(df$problem %in% "a" & (df$solution1 %in% "eat" | df$solution2 %in% "eat")),]
# id problem solution1 solution2
# 2 2 b sleep read
# 3 3 c drink eat
# 4 4 d play drink
# 6 6 b play sleep
# 7 7 c play eat
# 8 8 a drink read
# 9 9 b play eat
如果您比较确切的字符串,那么这里并不是真正需要正则表达式。使用 %in%
进行子集化会为您节省很多时间,因为它会比较向量。例如而不是 "a"
可能会有 c("a", "b", "c")
等等