正则表达式或 r 中文本的条件
Regex or condition for text in r
假设我有一个文本
1) "Project:ABC is located near CBA, being too far from city "
2) "P r o j e c t : PQR is located near RQP, highlights some greenary"
我想提取单词“project”和“,”之间的文本,这样我的输出就是ABC is located near CBA
" 来自 text1 和 "PQR is located near RQP
" 来自 text2,为此我使用了正则表达式
x="Project:ABC is located near CBA, being too far from city "
sub(".*Project: *(.*?) *, .*", "\1", x)
O\P
ABC is located near CBA
但是对于 text2) 它没有提供正确的输出,所以我如何包含 OR 条件以便满足我的两个条件。任何建议都会有所帮助。
谢谢
您可以将某些正则表达式与前瞻和后视断言一起使用。
在一个小例子中使用 stringr
包
Vec <- c("Project:ABC is located near CBA, being too far from city",
"P r o j e c t : PQR is located near RQP, highlights some greenary")
library(stringr)
str_extract(Vec, "(?<=:).*(?=,)")
#> [1] "ABC is located near CBA" " PQR is located near RQP"
如果您的输入更复杂,则应调整正则表达式,因为它可能不够严格(目前,它是第一个 :
和最后一个 ,
之间的任何内容)
base R
中的一个选项是 gsub
匹配字符 (.*
) 直到 :
后跟零个或多个空格 (\s*
) 或(|
) a ,
后跟其他字符并替换为空白 (""
)
gsub(".*:\s*|,.*", "", Vec)
#[1] "ABC is located near CBA" "PQR is located near RQP"
如果我们需要匹配 Project
后跟 :
pat <- paste0(gsub("", "\\s*", "Project"), ":\s*|\s*,.*")
gsub(pat, "", Vec)
#[1] "ABC is located near CBA" "PQR is located near RQP" "Ganga gnd A3 And 3.."
数据
Vec <- c("Project:ABC is located near CBA, being too far from city",
"P r o j e c t : PQR is located near RQP, highlights some greenary",
"Project: Ganga gnd A3 And 3.., Plot Bearing / CTS / Survey / Final Plot No.: Sr No"
)
如果 Project
字不是问题:
> text
[1] "Project:ABC is located near CBA, being too far from city "
> substr(text,grep(":",strsplit(text,'')[[1]]),grep(",",strsplit(text,'')[[1]]))
[1] ":ABC is located near CBA,"
> substr(text,grep(":",strsplit(text,'')[[1]])+1,grep(",",strsplit(text,'')[[1]])-1)
[1] "ABC is located near CBA"
> text <- "P r o j e c t : PQR is located near RQP, highlights some greenary"
> substr(text,grep(":",strsplit(text,'')[[1]])+1,grep(",",strsplit(text,'')[[1]])-1)
[1] " PQR is located near RQP"
应该没问题!
让你的正则表达式更灵活一点:[^:]+:\s*([^,]+),.*
> sub("[^:]+:\s*([^,]+),.*", "\1", "P r o j e c t : PQR is located near RQP, highlights some greenary")
[1] "PQR is located near RQP"
和
> sub("[^:]+:\s*([^,]+),.*", "\1", "Project:ABC is located near CBA, being too far from city ")
[1] "ABC is located near CBA"
假设我有一个文本
1) "Project:ABC is located near CBA, being too far from city "
2) "P r o j e c t : PQR is located near RQP, highlights some greenary"
我想提取单词“project”和“,”之间的文本,这样我的输出就是ABC is located near CBA
" 来自 text1 和 "PQR is located near RQP
" 来自 text2,为此我使用了正则表达式
x="Project:ABC is located near CBA, being too far from city "
sub(".*Project: *(.*?) *, .*", "\1", x)
O\P
ABC is located near CBA
但是对于 text2) 它没有提供正确的输出,所以我如何包含 OR 条件以便满足我的两个条件。任何建议都会有所帮助。 谢谢
您可以将某些正则表达式与前瞻和后视断言一起使用。
在一个小例子中使用 stringr
包
Vec <- c("Project:ABC is located near CBA, being too far from city",
"P r o j e c t : PQR is located near RQP, highlights some greenary")
library(stringr)
str_extract(Vec, "(?<=:).*(?=,)")
#> [1] "ABC is located near CBA" " PQR is located near RQP"
如果您的输入更复杂,则应调整正则表达式,因为它可能不够严格(目前,它是第一个 :
和最后一个 ,
之间的任何内容)
base R
中的一个选项是 gsub
匹配字符 (.*
) 直到 :
后跟零个或多个空格 (\s*
) 或(|
) a ,
后跟其他字符并替换为空白 (""
)
gsub(".*:\s*|,.*", "", Vec)
#[1] "ABC is located near CBA" "PQR is located near RQP"
如果我们需要匹配 Project
后跟 :
pat <- paste0(gsub("", "\\s*", "Project"), ":\s*|\s*,.*")
gsub(pat, "", Vec)
#[1] "ABC is located near CBA" "PQR is located near RQP" "Ganga gnd A3 And 3.."
数据
Vec <- c("Project:ABC is located near CBA, being too far from city",
"P r o j e c t : PQR is located near RQP, highlights some greenary",
"Project: Ganga gnd A3 And 3.., Plot Bearing / CTS / Survey / Final Plot No.: Sr No"
)
如果 Project
字不是问题:
> text
[1] "Project:ABC is located near CBA, being too far from city "
> substr(text,grep(":",strsplit(text,'')[[1]]),grep(",",strsplit(text,'')[[1]]))
[1] ":ABC is located near CBA,"
> substr(text,grep(":",strsplit(text,'')[[1]])+1,grep(",",strsplit(text,'')[[1]])-1)
[1] "ABC is located near CBA"
> text <- "P r o j e c t : PQR is located near RQP, highlights some greenary"
> substr(text,grep(":",strsplit(text,'')[[1]])+1,grep(",",strsplit(text,'')[[1]])-1)
[1] " PQR is located near RQP"
应该没问题!
让你的正则表达式更灵活一点:[^:]+:\s*([^,]+),.*
> sub("[^:]+:\s*([^,]+),.*", "\1", "P r o j e c t : PQR is located near RQP, highlights some greenary")
[1] "PQR is located near RQP"
和
> sub("[^:]+:\s*([^,]+),.*", "\1", "Project:ABC is located near CBA, being too far from city ")
[1] "ABC is located near CBA"