提取R中满足两个条件的字符向量的句子

Question

假设我们将一个全文文件作为字符向量加载到 R 中。我正在寻找一种代码，可以提取两个“.”之间的所有文本，只要在这两个句点之间存在 "and the" 和至少一个“%”。

character <- as.character("Walmart stocks remained the same.  Sony reported an increase, and the percent was posted at 1.0%. And the google also remained the same.  And the percent of increase for Best Buy was 2.5%.")

看一下这个简短的例子，我希望在某个地方得到类似

的输出

[1] Sony reported an increase, and the percent was posted at 1.0%.
[2] And the percent of increase for Best Buy was 2.5%.

Answer 1

快速解决方案：

library(magrittr)
"Walmart stocks remained the same.  Sony reported an increase, and the percent was posted at 1.0%. And the google also remained the same.  And the percent of increase for Best Buy was 2.5%." %>%
  ## split the string at the sentence boundaries
  gsub("\.\s", "\.\t", .) %>%
  strsplit("\t") %>% unlist() %>%
  ## keep only sentences that contain "and the" (irrespective of case)
  grep("and the", x = ., value = TRUE, ignore.case = TRUE) %>%
  ## keep only the sentences that end with %.
  grep("%\.$", x = ., value = TRUE) %>%
  ## remove leading white spaces
  gsub("^\s?", "", x = .)

提取R中满足两个条件的字符向量的句子

Extract sentences of a character vector satisfying two conditions in R

r

paste