使用 R 删除 csv 文件中的换行符、段落符

Remove line breaks, paragraph breaks in csv file using R

我有一个包含一些换行符或段落符的 csv 文件。我是怎么知道的,当我在 word 文档中打开这个 csv 文件时,我看到了 pilcrow 符号 ¶,在段落之后和新段落的开头之前。如何从 R 中的这个 csv 文件中删除这些换行符?非常感谢任何帮助。

既往病史

  1. 2002 年 10 月持续性心房颤动伴心房扑动,状态-post 心房扑动消融线。
  2. Tachy/brady综合症。
  3. 胰岛素依赖型糖尿病。患有糖尿病约 35 年。
  4. 高血压,好吧

构造的 csv 文件在每一行的末尾都有换行符,这样任何解析器都可以知道一行何时结束(例如,如果您在 Python 中手动编写 csv 文件,您有在末尾包含 \n 换行符。尝试直接在 R 中打开 csv 文件并使用 head(your_file) 检查内容,您应该会看到它像您一样显示会期待。

这是一个测试用例。您只想删除空行。这是文件 test.txt(包含拼写错误): (注意:您的示例显然不是 csv 文件。)

some header text

more text
 even omre text

--------------------

 txt= readLines("test.txt")
 newtext <- txt[nchar(txt)>0]
 newtext
#[1] "some header text" "more text"        " even omre text"

要删除带编号的行(以数字开头后跟句点的行),可以 post 使用 sub():

处理结果
 txt <- "PAST MEDICAL HISTORY

 1. Persistent atrial fibrillation with atrial flutter, status-post atrial flutter ablation line in October of 2002.
 2. Tachy/brady syndrome.
 3. Insulin-dependent diabetes.  Has been diabetic for approximately 35 years.  
 4. Hypertension, well"


 newtxt= readLines(textConnection(txt))
 sub("^[[:digit:].]+", "", newtxt)
#------------------------
[1] "PAST MEDICAL HISTORY"                                                                                             
[2] ""                                                                                                                 
[3] " Persistent atrial fibrillation with atrial flutter, status-post atrial flutter ablation line in October of 2002."
[4] " Tachy/brady syndrome."                                                                                           
[5] " Insulin-dependent diabetes.  Has been diabetic for approximately 35 years.  "                                    
[6] " Hypertension, well"     

> sub("^[[:digit:].]+", "", newtxt[nchar(newtxt)>0])
[1] "PAST MEDICAL HISTORY"                                                                                             
[2] " Persistent atrial fibrillation with atrial flutter, status-post atrial flutter ablation line in October of 2002."
[3] " Tachy/brady syndrome."                                                                                           
[4] " Insulin-dependent diabetes.  Has been diabetic for approximately 35 years.  "                                    
[5] " Hypertension, well"