如何在 tidyr::extract 中的捕获组外使用或逻辑测试

How to use or logic tests outside capture groups in tidyr::extract

我需要一种方法来在 tidyr::extract 中使用包含整个单词的 'Or' 语句,如下一个示例所示。

假设我有下一个字符串:

string1 <- data.frame (col = "asdnajksdn**thingA**asdnaksjdnajksn")
string2 <- data.frame (col = "asdnajksdn**itemA**asdnaksjdnajksn")

我想使用 tidyr::extract() 提取具有相同正则表达式的 'A' 和 'B',但我不想提取 'word' 或 'thing',所需的输出将是:

string1 %>% extract(col = 'col', regex = regex, into = "var")
> NewColumn
  "A"

string2 %>% extract(col = 'col', regex = regex, into = "NewColumn")
> NewColumn
  "B"

答案应该是这样的:

extract(string, col = "col", into = "NewColumn",
        regex = "(word)|(thing)(.)")

但我不能那样做,因为它会导致:

NewColumn NA
word      A

我知道在这个例子中我可以使用类似

的东西
"[ti][ht][ie][nm]g?(.)"

但我正在寻找更通用的解决方案。

由于 tidyr extract() 提取 capturing group 值,您可以将不想提取的备选方案分组non-capturing group.

non-capturing group 的语法是 (?:...):

If you do not need the group to capture its match, you can optimize this regular expression into Set(?:Value)?. The question mark and the colon after the opening parenthesis are the syntax that creates a non-capturing group. The question mark after the opening bracket is unrelated to the question mark at the end of the regex.

所以,使用类似的东西:

> library(tidyr)
> string1 <- data.frame (col = "asdnajksdnthingAasdnaksjdnajksn")
> string1 %>% extract(col, c("NewColumn"), "(?:word|thing)(.)")
  NewColumn
1         A