从字符串中搜索多次出现的子字符串

Question

下面是我的数据集。我像这样在 key 列上使用函数 strdetect()。

str_detect(mydata$Key, 'R')

我希望能够搜索包含 2 个 R 的字符串。显然在下面的示例中我可以只搜索 R002R009 但我并不总是没有附加到 R 的数字所以我只想搜索带有 2 R 的字符串

我需要能够在 ifelse 语句中使用它

 mydata[1:3]
           IDENTIFIER  DATE_TIME         X-VALUE     Y-VALUE      Key
    1      214461707   1/04/2019 8:25           1         -3       A001
    2      214461789   1/04/2019 10:16          1         -2       R001
    3      214461789   1/04/2019 10:16          1         -5       R002R009

Answer 1

您可以使用 str_count 计算字母出现的次数并在 filter 中使用它。

library(dplyr)
library(stringr)

mydata %>% filter(str_count(Key, 'R') == 2)

#   FACILITY_ID      DATE_TIME XVALUE YVALUE      Key
#3   214461789 1/04/201910:16      1     -5 R002R009

Answer 2

您可以在 str_detect 中使用正则表达式。

mydata %>% 
  filter(str_detect(string = Key, pattern = "R.*R"))

结果：

         id FACILITY_ID DATE_TIME X.VALUE Y.VALUE      Key
3 214461789   1/04/2019     10:16       1      -5 R002R009

Answer 3

我们可以使用 subset 来自 base R

subset(mydata, nchar(gsub('[^R]+', '', Key)) == 2)

从字符串中搜索多次出现的子字符串

search for multiple occurrences of substring from string

r

stringr