Select 已知短语之间的第二个单词 - R 正则表达式
Select second word in between known phrases - R regex
我想 select 文本在已知短语之间 但 使用 R 和正则表达式排除第一个词 。格式如下
"known phrase + unknown_word + target phrase + known_word + bla bla"
例如:
Tesco Plc sells coffee beans today in stores over the uk
Known phrase = "Tesco Plc"
Unknown word = "sells"
Target phrase = "coffee beans"
known word = "today"
bla bla (unrelated text) = "in stores over the uk"
初次尝试
text = "Tesco Plc sells coffee beans today in stores over the uk"
known_phrase = "Tesco Plc"
known_word = "today"
# code
str_extract(text, paste0("(?<=",known_phrase,").*(?=", known_word ,")"))]
这 select 是 unknown_word
和 target phrase
。但我只想要 target phrase
/
您可以使用
stringr::str_match(x, "Tesco\s+Plc\s+\w+\s+(.*?)\s+today")[,2]
## OR
Known_phrase = "Tesco Plc"
known_word = "today"
stringr::str_match(x, paste0(Known_phrase, "\s+\w+\s+(.*?)\s+", known_word))[,2]
您可能需要一个转义函数,因为您的变量是动态的:
regex.escape <- function(string) {
gsub("([][{}()+*^$|\\?.])", "\\\1", string)
}
Known_phrase = "Tesco Plc"
known_word = "today"
stringr::str_match(x, paste0(regex.escape(Known_phrase), "\s+\w+\s+(.*?)\s+", regex.escape(known_word)))[,2]
我想 select 文本在已知短语之间 但 使用 R 和正则表达式排除第一个词 。格式如下
"known phrase + unknown_word + target phrase + known_word + bla bla"
例如:
Tesco Plc sells coffee beans today in stores over the uk
Known phrase = "Tesco Plc"
Unknown word = "sells"
Target phrase = "coffee beans"
known word = "today"
bla bla (unrelated text) = "in stores over the uk"
初次尝试
text = "Tesco Plc sells coffee beans today in stores over the uk"
known_phrase = "Tesco Plc"
known_word = "today"
# code
str_extract(text, paste0("(?<=",known_phrase,").*(?=", known_word ,")"))]
这 select 是 unknown_word
和 target phrase
。但我只想要 target phrase
/
您可以使用
stringr::str_match(x, "Tesco\s+Plc\s+\w+\s+(.*?)\s+today")[,2]
## OR
Known_phrase = "Tesco Plc"
known_word = "today"
stringr::str_match(x, paste0(Known_phrase, "\s+\w+\s+(.*?)\s+", known_word))[,2]
您可能需要一个转义函数,因为您的变量是动态的:
regex.escape <- function(string) {
gsub("([][{}()+*^$|\\?.])", "\\\1", string)
}
Known_phrase = "Tesco Plc"
known_word = "today"
stringr::str_match(x, paste0(regex.escape(Known_phrase), "\s+\w+\s+(.*?)\s+", regex.escape(known_word)))[,2]