使用 R 删除 csv 文件中的换行符、段落符
Remove line breaks, paragraph breaks in csv file using R
我有一个包含一些换行符或段落符的 csv 文件。我是怎么知道的,当我在 word 文档中打开这个 csv 文件时,我看到了 pilcrow 符号 ¶,在段落之后和新段落的开头之前。如何从 R 中的这个 csv 文件中删除这些换行符?非常感谢任何帮助。
既往病史
- 2002 年 10 月持续性心房颤动伴心房扑动,状态-post 心房扑动消融线。
- Tachy/brady综合症。
- 胰岛素依赖型糖尿病。患有糖尿病约 35 年。
- 高血压,好吧
构造的 csv 文件在每一行的末尾都有换行符,这样任何解析器都可以知道一行何时结束(例如,如果您在 Python 中手动编写 csv 文件,您有在末尾包含 \n 换行符。尝试直接在 R 中打开 csv 文件并使用 head(your_file) 检查内容,您应该会看到它像您一样显示会期待。
这是一个测试用例。您只想删除空行。这是文件 test.txt
(包含拼写错误):
(注意:您的示例显然不是 csv 文件。)
some header text
more text
even omre text
--------------------
txt= readLines("test.txt")
newtext <- txt[nchar(txt)>0]
newtext
#[1] "some header text" "more text" " even omre text"
要删除带编号的行(以数字开头后跟句点的行),可以 post 使用 sub():
处理结果
txt <- "PAST MEDICAL HISTORY
1. Persistent atrial fibrillation with atrial flutter, status-post atrial flutter ablation line in October of 2002.
2. Tachy/brady syndrome.
3. Insulin-dependent diabetes. Has been diabetic for approximately 35 years.
4. Hypertension, well"
newtxt= readLines(textConnection(txt))
sub("^[[:digit:].]+", "", newtxt)
#------------------------
[1] "PAST MEDICAL HISTORY"
[2] ""
[3] " Persistent atrial fibrillation with atrial flutter, status-post atrial flutter ablation line in October of 2002."
[4] " Tachy/brady syndrome."
[5] " Insulin-dependent diabetes. Has been diabetic for approximately 35 years. "
[6] " Hypertension, well"
> sub("^[[:digit:].]+", "", newtxt[nchar(newtxt)>0])
[1] "PAST MEDICAL HISTORY"
[2] " Persistent atrial fibrillation with atrial flutter, status-post atrial flutter ablation line in October of 2002."
[3] " Tachy/brady syndrome."
[4] " Insulin-dependent diabetes. Has been diabetic for approximately 35 years. "
[5] " Hypertension, well"
我有一个包含一些换行符或段落符的 csv 文件。我是怎么知道的,当我在 word 文档中打开这个 csv 文件时,我看到了 pilcrow 符号 ¶,在段落之后和新段落的开头之前。如何从 R 中的这个 csv 文件中删除这些换行符?非常感谢任何帮助。
既往病史
- 2002 年 10 月持续性心房颤动伴心房扑动,状态-post 心房扑动消融线。
- Tachy/brady综合症。
- 胰岛素依赖型糖尿病。患有糖尿病约 35 年。
- 高血压,好吧
构造的 csv 文件在每一行的末尾都有换行符,这样任何解析器都可以知道一行何时结束(例如,如果您在 Python 中手动编写 csv 文件,您有在末尾包含 \n 换行符。尝试直接在 R 中打开 csv 文件并使用 head(your_file) 检查内容,您应该会看到它像您一样显示会期待。
这是一个测试用例。您只想删除空行。这是文件 test.txt
(包含拼写错误):
(注意:您的示例显然不是 csv 文件。)
some header text
more text
even omre text
--------------------
txt= readLines("test.txt")
newtext <- txt[nchar(txt)>0]
newtext
#[1] "some header text" "more text" " even omre text"
要删除带编号的行(以数字开头后跟句点的行),可以 post 使用 sub():
处理结果 txt <- "PAST MEDICAL HISTORY
1. Persistent atrial fibrillation with atrial flutter, status-post atrial flutter ablation line in October of 2002.
2. Tachy/brady syndrome.
3. Insulin-dependent diabetes. Has been diabetic for approximately 35 years.
4. Hypertension, well"
newtxt= readLines(textConnection(txt))
sub("^[[:digit:].]+", "", newtxt)
#------------------------
[1] "PAST MEDICAL HISTORY"
[2] ""
[3] " Persistent atrial fibrillation with atrial flutter, status-post atrial flutter ablation line in October of 2002."
[4] " Tachy/brady syndrome."
[5] " Insulin-dependent diabetes. Has been diabetic for approximately 35 years. "
[6] " Hypertension, well"
> sub("^[[:digit:].]+", "", newtxt[nchar(newtxt)>0])
[1] "PAST MEDICAL HISTORY"
[2] " Persistent atrial fibrillation with atrial flutter, status-post atrial flutter ablation line in October of 2002."
[3] " Tachy/brady syndrome."
[4] " Insulin-dependent diabetes. Has been diabetic for approximately 35 years. "
[5] " Hypertension, well"