基于 R 中的正则表达式创建换行符

Question

我是 R 的新手。我从网上提取了一些文本并粘贴到一个文本文件中。他们看起来像这样。

    c("HR name as meena in malad west branch first source ltd called me for interview as openings in llyods chat process as banking process she told me 3 rounds of interview and other hr vl ask me these questions.As she said there r openings but when other hr taken my interview she told there r no...", 
"", "", "Sir with due respect from 7 nov 2015, i dont receive my sms alerts from my registered mobile number as 9596159288 . ", 
"Account name Tariq Ahmad Mir", "Branch: WATRIGAM", "Contact: 1954-235307", 
"", "IFSC Code: SBIN0004591 ", "", "", "MICR Code: 193002321..."

这些评论中的每一个都在评论末尾用“...”分隔。我试图将每条评论连接成一行。我尝试了以下代码：

a <- readLines("banking1.txt", warn = FALSE)
a <- a[sapply(a, nchar) > 0]
a <- paste(a, collapse = ",")

这给了我如下输出：

"HR name as meena in malad west branch first source ltd called me for interview as openings in llyods chat process as banking process she told me 3 rounds of interview and other hr vl ask me these questions.As she said there r openings but when other hr taken my interview she told there r no...,Sir with due respect from 7 nov 2015, i dont receive my sms alerts from my registered mobile number as 9596159288 . ,Account name Tariq Ahmad Mir,Branch: WATRIGAM,Contact: 1954-235307,IFSC Code: SBIN0004591 ,MICR Code: 193002321..."

我正在尝试使用 ... 分隔符拆分它们。

a <- strsplit(a, "...,")
a <- strsplit(a, "...,")[[1]]
a <- noquote(strsplit(a, "...,")[[1]])

和许多其他类似的选项。但输出不是我所期望的。我需要的是

HR name as meena in malad west branch first source ltd called me for interview as openings in llyods chat process as banking process she told me 3 rounds of interview and other hr vl ask me these questions.As she said there r openings but when other hr taken my interview she told there r no...
Sir with due respect from 7 nov 2015, i dont receive my sms alerts from my registered mobile number as 9512139288 . Account name Tariq Ahmad Mir Branch: MAGRITAW Contact: 1954-235307 IFSC Code: AVCN0001234 MICR Code: 19300321...

有人可以帮忙吗？

Answer 1

您可以使用负面回顾。

x <- c("HR name as meena in malad west branch first source ltd called me for interview as openings in llyods chat process as banking process she told me 3 rounds of interview and other hr vl ask me these questions.As she said there r openings but when other hr taken my interview she told there r no...", 
  "", "", "Sir with due respect from 7 nov 2015, i dont receive my sms alerts from my registered mobile number as 9596159288 . ", 
  "Account name Tariq Ahmad Mir", "Branch: WATRIGAM", "Contact: 1954-235307", 
  "", "IFSC Code: SBIN0004591 ", "", "", "MICR Code: 193002321...")
y <- paste(x, collapse="\n")
z <- gsub("(?<!\.{3})\n+", " ", y, perl=TRUE) 
z <- strsplit(z, "\n")

DEMO

基于 R 中的正则表达式创建换行符

creating line breaks based on regex in R

split

r

lines