正则表达式或 r 中文本的条件

Regex or condition for text in r

假设我有一个文本

1) "Project:ABC is located near CBA, being too far from city  "
2) "P r o j e c t : PQR is located near RQP, highlights some greenary"

我想提取单词“project”和“,”之间的文本,这样我的输出就是ABC is located near CBA" 来自 text1 和 "PQR is located near RQP" 来自 text2,为此我使用了正则表达式

x="Project:ABC is located near CBA, being too far from city  "
sub(".*Project: *(.*?) *, .*", "\1", x)
O\P
ABC is located near CBA

但是对于 text2) 它没有提供正确的输出,所以我如何包含 OR 条件以便满足我的两个条件。任何建议都会有所帮助。 谢谢

您可以将某些正则表达式与前瞻和后视断言一起使用。

在一个小例子中使用 stringr

Vec <- c("Project:ABC is located near CBA, being too far from city", 
         "P r o j e c t : PQR is located near RQP, highlights some greenary")
library(stringr)
str_extract(Vec, "(?<=:).*(?=,)")
#> [1] "ABC is located near CBA"  " PQR is located near RQP"

如果您的输入更复杂,则应调整正则表达式,因为它可能不够严格(目前,它是第一个 : 和最后一个 , 之间的任何内容)

base R 中的一个选项是 gsub 匹配字符 (.*) 直到 : 后跟零个或多个空格 (\s*) 或(|) a , 后跟其他字符并替换为空白 ("")

gsub(".*:\s*|,.*", "", Vec)
#[1] "ABC is located near CBA" "PQR is located near RQP"

如果我们需要匹配 Project 后跟 :

pat <- paste0(gsub("", "\\s*", "Project"), ":\s*|\s*,.*")
gsub(pat, "", Vec)
#[1] "ABC is located near CBA" "PQR is located near RQP" "Ganga gnd A3 And 3.."   

数据

Vec <- c("Project:ABC is located near CBA, being too far from city", 
 "P r o j e c t : PQR is located near RQP, highlights some greenary", 
 "Project: Ganga gnd A3 And 3.., Plot Bearing / CTS / Survey / Final Plot No.: Sr No"
 )

如果 Project 字不是问题:

> text
[1] "Project:ABC is located near CBA, being too far from city  "
> substr(text,grep(":",strsplit(text,'')[[1]]),grep(",",strsplit(text,'')[[1]]))
[1] ":ABC is located near CBA,"
> substr(text,grep(":",strsplit(text,'')[[1]])+1,grep(",",strsplit(text,'')[[1]])-1)
[1] "ABC is located near CBA"
> text <- "P r o j e c t : PQR is located near RQP, highlights some greenary"
> substr(text,grep(":",strsplit(text,'')[[1]])+1,grep(",",strsplit(text,'')[[1]])-1)
[1] " PQR is located near RQP"

应该没问题!

让你的正则表达式更灵活一点:[^:]+:\s*([^,]+),.*

> sub("[^:]+:\s*([^,]+),.*", "\1", "P r o j e c t : PQR is located near RQP, highlights some greenary")
[1] "PQR is located near RQP"

> sub("[^:]+:\s*([^,]+),.*", "\1", "Project:ABC is located near CBA, being too far from city  ")
[1] "ABC is located near CBA"