Select 除带加号正则表达式的字符串外的所有内容

Question

假设我在 R 中有以下数据框：

df <- tribble(
    ~id, ~key,
    1, "+999..3762962",
    2, "0677219-30911",
    3, "-739812//3918",
    4, "+273$79838",
    5, "1904-03940538",
    6, NA
)

我想过滤掉没有加号的每一行。

根据正则表达式规则，使用“[^...]”我应该能够排除我想要的任何字符。但是，当尝试类似：

library(tidyverse)

df %>% 
    filter(str_detect(key, "[^\+]"))

它没有成功，它最终过滤了除缺失值之外的所有内容（它排除了第 6 行）。

我在这里做错了什么？我曾尝试搜索类似的问题，但他们要求使用正则表达式进行非常具体的选择，因此，结果 code/suggestion 我几乎无法理解。我相信答案很简单。

谢谢。

Answer 1

我们可以使用 str_detect 搜索字符串中是否存在 + 字符并指定 negate = TRUE （来自@thelatemail）（这里的 | 与 is.na - 对 return 具有缺失值的行进行处理 - 默认情况下，filter 删除那些 NA 行）

library(dplyr)
library(stringr)
df %>% 
   filter(str_detect(key, fixed('+'), negate = TRUE)|is.na(key))
# A tibble: 4 x 2
#    id key          
#  <dbl> <chr>        
#1     2 0677219-30911
#2     3 -739812//3918
#3     5 1904-03940538
#4     6 <NA>

或者如果我们使用 OP 的代码，请确保指定字符串的开头 (^) 和 $，即一个或多个不是 + ( [^+]+) 从字符串的开头 (^) 到结尾 ($)

df %>% 
   filter(str_detect(key, '^[^+]+$')|is.na(key))
# A tibble: 4 x 2
#     id key          
#  <dbl> <chr>        
#1     2 0677219-30911
#2     3 -739812//3918
#3     5 1904-03940538
#4     6 <NA>

Answer 2

grep 的基础 R 版本：

df[grep('+', df$key, fixed = TRUE, invert = TRUE),]

#     id key          
#  <dbl> <chr>        
#1     2 0677219-30911
#2     3 -739812//3918
#3     5 1904-03940538
#4     6 NA

Select 除带加号正则表达式的字符串外的所有内容

Select everything except strings with a plus sign regex

regex

r

regex-negation

stringr