删除子文本数组之前的文本
Remove text before an array of subtexts
我有一组字符串需要操作。在每一个中,如果它们包含一组子字符串,我想保留子字符串,否则保持不变。
下面是一个例子:
keep <- c("USA","UNITED STATES")
keep <- paste0(paste0(" ",keep,"$"),collapse="|")
data <- c("DETROIT","DETROIT USA","DETROIT UNITED STATES")
expected_result <- c("DETROIT","USA","UNITED STATES")
您可以使用 str_extract
提取模式(如果存在)。这个 returns NA
以防模式丢失,你可以用原来的 data
.
替换
keep <- c("USA","UNITED STATES")
keep <- paste0(paste0(" ",keep,"$"),collapse="|")
result <- stringr::str_extract(data, keep)
result[is.na(result)] <- data[is.na(result)]
trimws(result)
#[1] "DETROIT" "USA" "UNITED STATES"
您可以使用
data <- c("DETROIT","DETROIT USA","DETROIT UNITED STATES")
keep <- c("USA","UNITED STATES")
regex <- paste0(".*\s*\b(",paste0(keep,collapse="|"), ")\b")
sub(regex, "\1", data)
## => [1] "DETROIT" "USA" "UNITED STATES"
正则表达式为 .*\s*\b(USA|UNITED STATES)\b
,参见 its online demo。
详情:
.*
- 尽可能多的任意零个或多个字符
\s*
- 零个或多个空格
\b(USA|UNITED STATES)\b
- 整个单词 USA
或 UNITED STATES
,捕获到第 1 组(替换模式中的
)。
我有一组字符串需要操作。在每一个中,如果它们包含一组子字符串,我想保留子字符串,否则保持不变。
下面是一个例子:
keep <- c("USA","UNITED STATES")
keep <- paste0(paste0(" ",keep,"$"),collapse="|")
data <- c("DETROIT","DETROIT USA","DETROIT UNITED STATES")
expected_result <- c("DETROIT","USA","UNITED STATES")
您可以使用 str_extract
提取模式(如果存在)。这个 returns NA
以防模式丢失,你可以用原来的 data
.
keep <- c("USA","UNITED STATES")
keep <- paste0(paste0(" ",keep,"$"),collapse="|")
result <- stringr::str_extract(data, keep)
result[is.na(result)] <- data[is.na(result)]
trimws(result)
#[1] "DETROIT" "USA" "UNITED STATES"
您可以使用
data <- c("DETROIT","DETROIT USA","DETROIT UNITED STATES")
keep <- c("USA","UNITED STATES")
regex <- paste0(".*\s*\b(",paste0(keep,collapse="|"), ")\b")
sub(regex, "\1", data)
## => [1] "DETROIT" "USA" "UNITED STATES"
正则表达式为 .*\s*\b(USA|UNITED STATES)\b
,参见 its online demo。
详情:
.*
- 尽可能多的任意零个或多个字符\s*
- 零个或多个空格\b(USA|UNITED STATES)\b
- 整个单词USA
或UNITED STATES
,捕获到第 1 组(替换模式中的)。