str_detect 还在过滤器中发现 NA

Question

我想过滤掉列中包含字符串的行。我正在使用 tidyverse 解决方案。我遇到的问题是 str_detect 似乎也在寻找 NA 结果，因此我的过滤器也删除了这些结果：

df1 = data.frame(x1 = c("PI", NA, "Yes", "text"),
                 x2 = as.character(c(NA, 1, NA,"text")),
                 x3 = c("foo", "bar","foo", "bar"))

> df1
    x1   x2  x3
1   PI <NA> foo
2 <NA>    1 bar
3  Yes <NA> foo
4 text text bar

#remove rows which have "PI" in column `x1`:

df2 = df1%>%
  filter(!str_detect(x1, "(?i)pi"))

> df2
    x1   x2  x3
1  Yes <NA> foo
2 text text bar

如何防止 str_detect 找到 NA？

Answer 1

使用 is.na 和 | 添加条件。 NA 问题只是因为对于 NA 元素，str_detect returns NA 会被 filter

自动删除

library(dplyr)
library(stringr)
df1 %>%
    filter(is.na(x1) |
       str_detect(x1, regex("pi", ignore_case = TRUE), negate = TRUE))

-输出

   x1   x2  x3
1 <NA>    1 bar
2  Yes <NA> foo
3 text text bar

即检查 str_detect

的输出

with(df1, str_detect(x1, regex("pi", ignore_case = TRUE), negate = TRUE))
[1] FALSE    NA  TRUE  TRUE

NA 将保持原样，除非我们使其变为真

 with(df1, str_detect(x1, regex("pi", ignore_case = TRUE), negate = TRUE)|is.na(x1))
[1] FALSE  TRUE  TRUE  TRUE

或者另一种选择是 coalesce 与 TRUE 以便 str_detect 中的所有 NA 元素将更改为 TRUE 值

df1 %>% 
   filter(coalesce(str_detect(x1, regex("pi", ignore_case = TRUE), 
       negate = TRUE), TRUE))
    x1   x2  x3
1 <NA>    1 bar
2  Yes <NA> foo
3 text text bar

Answer 2

我们可以像下面那样尝试subset

> subset(
+   df1,
+   replace(x1 != "PI", is.na(x1), TRUE)
+ )
    x1   x2  x3
2 <NA>    1 bar
3  Yes <NA> foo
4 text text bar

str_detect 还在过滤器中发现 NA

str_detect also finding NA in filter

r

filter

stringr