基于 R 中的正则表达式创建换行符
creating line breaks based on regex in R
我是 R 的新手。我从网上提取了一些文本并粘贴到一个文本文件中。他们看起来像这样。
c("HR name as meena in malad west branch first source ltd called me for interview as openings in llyods chat process as banking process she told me 3 rounds of interview and other hr vl ask me these questions.As she said there r openings but when other hr taken my interview she told there r no...",
"", "", "Sir with due respect from 7 nov 2015, i dont receive my sms alerts from my registered mobile number as 9596159288 . ",
"Account name Tariq Ahmad Mir", "Branch: WATRIGAM", "Contact: 1954-235307",
"", "IFSC Code: SBIN0004591 ", "", "", "MICR Code: 193002321..."
这些评论中的每一个都在评论末尾用“...”分隔。我试图将每条评论连接成一行。我尝试了以下代码:
a <- readLines("banking1.txt", warn = FALSE)
a <- a[sapply(a, nchar) > 0]
a <- paste(a, collapse = ",")
这给了我如下输出:
"HR name as meena in malad west branch first source ltd called me for interview as openings in llyods chat process as banking process she told me 3 rounds of interview and other hr vl ask me these questions.As she said there r openings but when other hr taken my interview she told there r no...,Sir with due respect from 7 nov 2015, i dont receive my sms alerts from my registered mobile number as 9596159288 . ,Account name Tariq Ahmad Mir,Branch: WATRIGAM,Contact: 1954-235307,IFSC Code: SBIN0004591 ,MICR Code: 193002321..."
我正在尝试使用 ... 分隔符拆分它们。
a <- strsplit(a, "...,")
a <- strsplit(a, "...,")[[1]]
a <- noquote(strsplit(a, "...,")[[1]])
和许多其他类似的选项。但输出不是我所期望的。我需要的是
HR name as meena in malad west branch first source ltd called me for interview as openings in llyods chat process as banking process she told me 3 rounds of interview and other hr vl ask me these questions.As she said there r openings but when other hr taken my interview she told there r no...
Sir with due respect from 7 nov 2015, i dont receive my sms alerts from my registered mobile number as 9512139288 . Account name Tariq Ahmad Mir Branch: MAGRITAW Contact: 1954-235307 IFSC Code: AVCN0001234 MICR Code: 19300321...
有人可以帮忙吗?
您可以使用负面回顾。
x <- c("HR name as meena in malad west branch first source ltd called me for interview as openings in llyods chat process as banking process she told me 3 rounds of interview and other hr vl ask me these questions.As she said there r openings but when other hr taken my interview she told there r no...",
"", "", "Sir with due respect from 7 nov 2015, i dont receive my sms alerts from my registered mobile number as 9596159288 . ",
"Account name Tariq Ahmad Mir", "Branch: WATRIGAM", "Contact: 1954-235307",
"", "IFSC Code: SBIN0004591 ", "", "", "MICR Code: 193002321...")
y <- paste(x, collapse="\n")
z <- gsub("(?<!\.{3})\n+", " ", y, perl=TRUE)
z <- strsplit(z, "\n")
我是 R 的新手。我从网上提取了一些文本并粘贴到一个文本文件中。他们看起来像这样。
c("HR name as meena in malad west branch first source ltd called me for interview as openings in llyods chat process as banking process she told me 3 rounds of interview and other hr vl ask me these questions.As she said there r openings but when other hr taken my interview she told there r no...",
"", "", "Sir with due respect from 7 nov 2015, i dont receive my sms alerts from my registered mobile number as 9596159288 . ",
"Account name Tariq Ahmad Mir", "Branch: WATRIGAM", "Contact: 1954-235307",
"", "IFSC Code: SBIN0004591 ", "", "", "MICR Code: 193002321..."
这些评论中的每一个都在评论末尾用“...”分隔。我试图将每条评论连接成一行。我尝试了以下代码:
a <- readLines("banking1.txt", warn = FALSE)
a <- a[sapply(a, nchar) > 0]
a <- paste(a, collapse = ",")
这给了我如下输出:
"HR name as meena in malad west branch first source ltd called me for interview as openings in llyods chat process as banking process she told me 3 rounds of interview and other hr vl ask me these questions.As she said there r openings but when other hr taken my interview she told there r no...,Sir with due respect from 7 nov 2015, i dont receive my sms alerts from my registered mobile number as 9596159288 . ,Account name Tariq Ahmad Mir,Branch: WATRIGAM,Contact: 1954-235307,IFSC Code: SBIN0004591 ,MICR Code: 193002321..."
我正在尝试使用 ... 分隔符拆分它们。
a <- strsplit(a, "...,")
a <- strsplit(a, "...,")[[1]]
a <- noquote(strsplit(a, "...,")[[1]])
和许多其他类似的选项。但输出不是我所期望的。我需要的是
HR name as meena in malad west branch first source ltd called me for interview as openings in llyods chat process as banking process she told me 3 rounds of interview and other hr vl ask me these questions.As she said there r openings but when other hr taken my interview she told there r no...
Sir with due respect from 7 nov 2015, i dont receive my sms alerts from my registered mobile number as 9512139288 . Account name Tariq Ahmad Mir Branch: MAGRITAW Contact: 1954-235307 IFSC Code: AVCN0001234 MICR Code: 19300321...
有人可以帮忙吗?
您可以使用负面回顾。
x <- c("HR name as meena in malad west branch first source ltd called me for interview as openings in llyods chat process as banking process she told me 3 rounds of interview and other hr vl ask me these questions.As she said there r openings but when other hr taken my interview she told there r no...",
"", "", "Sir with due respect from 7 nov 2015, i dont receive my sms alerts from my registered mobile number as 9596159288 . ",
"Account name Tariq Ahmad Mir", "Branch: WATRIGAM", "Contact: 1954-235307",
"", "IFSC Code: SBIN0004591 ", "", "", "MICR Code: 193002321...")
y <- paste(x, collapse="\n")
z <- gsub("(?<!\.{3})\n+", " ", y, perl=TRUE)
z <- strsplit(z, "\n")