在正则表达式中使用向量来提取仅具有已知开始和结束的子字符串
Using vector in a regex to extract substrings with only known start & end
如何在正则表达式中使用向量,以便将向量中的所有内容提取到另一个词?
我需要使用 str_match 从数据框中的一系列大字符串中提取多个子字符串。每个子字符串都以树种开头,以单词“links”结尾。
由于我需要的子字符串可以从许多不同的物种开始,我创建了一个名为 tree.sp 的向量来包含所有可能性。
test.df <- data.frame(
Heading_ChLk = c("West", 40.00, 80.00),
Bound_Desc = c("On the Base line along the south side of section 34 T 1 N, R 29 W of the 5th PM.",
"Set a 1/4 section corner post from which a pine 9 inches diameter bears N 43 E 35 links and a black oak 15 inches diameter bears S 10 E 30 links",
"Set a post corner to sections 33 & 34 from which a white oak 17 inches diameter bears N 32 W 57 links and a black oak 20 inches diameter bears N 46 E 19 links.")
)
tree.sp <- c("pine|black oak|white oak")
您可以使用 str_match
作为 -
library(stringr)
test.df$result <- str_match(test.df$Bound_Desc, sprintf('((%s).*links)', tree.sp))[, 2]
test.df$result
#[1] NA
#[2] "pine 9 inches diameter bears N 43 E 35 links and a black oak 15 inches diameter bears S 10 E 30 links"
#[3] "white oak 17 inches diameter bears N 32 W 57 links and a black oak 20 inches diameter bears N 46 E 19 links"
类似的代码也可以与str_extract
一起使用-
str_extract(test.df$Bound_Desc, sprintf('(%s).*links', tree.sp))
如何在正则表达式中使用向量,以便将向量中的所有内容提取到另一个词?
我需要使用 str_match 从数据框中的一系列大字符串中提取多个子字符串。每个子字符串都以树种开头,以单词“links”结尾。 由于我需要的子字符串可以从许多不同的物种开始,我创建了一个名为 tree.sp 的向量来包含所有可能性。
test.df <- data.frame(
Heading_ChLk = c("West", 40.00, 80.00),
Bound_Desc = c("On the Base line along the south side of section 34 T 1 N, R 29 W of the 5th PM.",
"Set a 1/4 section corner post from which a pine 9 inches diameter bears N 43 E 35 links and a black oak 15 inches diameter bears S 10 E 30 links",
"Set a post corner to sections 33 & 34 from which a white oak 17 inches diameter bears N 32 W 57 links and a black oak 20 inches diameter bears N 46 E 19 links.")
)
tree.sp <- c("pine|black oak|white oak")
您可以使用 str_match
作为 -
library(stringr)
test.df$result <- str_match(test.df$Bound_Desc, sprintf('((%s).*links)', tree.sp))[, 2]
test.df$result
#[1] NA
#[2] "pine 9 inches diameter bears N 43 E 35 links and a black oak 15 inches diameter bears S 10 E 30 links"
#[3] "white oak 17 inches diameter bears N 32 W 57 links and a black oak 20 inches diameter bears N 46 E 19 links"
类似的代码也可以与str_extract
一起使用-
str_extract(test.df$Bound_Desc, sprintf('(%s).*links', tree.sp))