如何在 tidyr::extract 中的捕获组外使用或逻辑测试
How to use or logic tests outside capture groups in tidyr::extract
我需要一种方法来在 tidyr::extract 中使用包含整个单词的 'Or' 语句,如下一个示例所示。
假设我有下一个字符串:
string1 <- data.frame (col = "asdnajksdn**thingA**asdnaksjdnajksn")
string2 <- data.frame (col = "asdnajksdn**itemA**asdnaksjdnajksn")
我想使用 tidyr::extract() 提取具有相同正则表达式的 'A' 和 'B',但我不想提取 'word' 或 'thing',所需的输出将是:
string1 %>% extract(col = 'col', regex = regex, into = "var")
> NewColumn
"A"
string2 %>% extract(col = 'col', regex = regex, into = "NewColumn")
> NewColumn
"B"
答案应该是这样的:
extract(string, col = "col", into = "NewColumn",
regex = "(word)|(thing)(.)")
但我不能那样做,因为它会导致:
NewColumn NA
word A
我知道在这个例子中我可以使用类似
的东西
"[ti][ht][ie][nm]g?(.)"
但我正在寻找更通用的解决方案。
由于 tidyr extract()
提取 capturing group 值,您可以将不想提取的备选方案分组non-capturing group.
non-capturing group 的语法是 (?:...)
:
If you do not need the group to capture its match, you can optimize this regular expression into Set(?:Value)?
. The question mark and the colon after the opening parenthesis are the syntax that creates a non-capturing group. The question mark after the opening bracket is unrelated to the question mark at the end of the regex.
所以,使用类似的东西:
> library(tidyr)
> string1 <- data.frame (col = "asdnajksdnthingAasdnaksjdnajksn")
> string1 %>% extract(col, c("NewColumn"), "(?:word|thing)(.)")
NewColumn
1 A
我需要一种方法来在 tidyr::extract 中使用包含整个单词的 'Or' 语句,如下一个示例所示。
假设我有下一个字符串:
string1 <- data.frame (col = "asdnajksdn**thingA**asdnaksjdnajksn")
string2 <- data.frame (col = "asdnajksdn**itemA**asdnaksjdnajksn")
我想使用 tidyr::extract() 提取具有相同正则表达式的 'A' 和 'B',但我不想提取 'word' 或 'thing',所需的输出将是:
string1 %>% extract(col = 'col', regex = regex, into = "var")
> NewColumn
"A"
string2 %>% extract(col = 'col', regex = regex, into = "NewColumn")
> NewColumn
"B"
答案应该是这样的:
extract(string, col = "col", into = "NewColumn",
regex = "(word)|(thing)(.)")
但我不能那样做,因为它会导致:
NewColumn NA
word A
我知道在这个例子中我可以使用类似
的东西"[ti][ht][ie][nm]g?(.)"
但我正在寻找更通用的解决方案。
由于 tidyr extract()
提取 capturing group 值,您可以将不想提取的备选方案分组non-capturing group.
non-capturing group 的语法是 (?:...)
:
If you do not need the group to capture its match, you can optimize this regular expression into
Set(?:Value)?
. The question mark and the colon after the opening parenthesis are the syntax that creates a non-capturing group. The question mark after the opening bracket is unrelated to the question mark at the end of the regex.
所以,使用类似的东西:
> library(tidyr)
> string1 <- data.frame (col = "asdnajksdnthingAasdnaksjdnajksn")
> string1 %>% extract(col, c("NewColumn"), "(?:word|thing)(.)")
NewColumn
1 A