在 R 中修改文本文档后创建相同的格式

Question

我正在尝试使用与数据框分开的特定 header 信息修改文本文档：

文档副本：http://www.filedropper.com/elist

我可以加载文档并进行编辑：

data <- read.table('elist.txt')

d <- data[!(data$V3==1),] # removes pointless 1 triggers
d2 <- d[!(d$V3>199),] # removes probe triggers
d3 <- d2[!(d2$V3<4),] # removes probes more triggers
d4 <- d3[!(d3$V3 == shift(d3$V3)),] # removes duplicate triggers
d5 <- d4[!(d4$V3 == shift(d4$V3+1)),] # removes +1 duplicate triggers

但是，我不知道如何导出文档，因此它包含相同的 header 信息 - 仅使用 write.table() 函数似乎不起作用。

我的问题是，如何修改文档，同时尽可能保持与原始格式相同的格式？

Answer 1

您可以使用 readLines

阅读初始行

heading_text <- readLines('elist.txt') # read all lines
heading_text <- heading_text[grepl("^#", unlist(l))] # subset comment lines (starting with #)
heading_text <- trimws(gsub("^#|\\t", " ", heading_text)) # trim whites, remove initial # and the tab separator flag (\t)

您可以使用 regex 选择 header 行。在这种情况下，我选择了其中包含单词 item 的行。然后你需要trim多个白色并设置一个列分隔符。

header <- gsub("\s+", ",", heading_text[grepl("item", heading_text)])
header <- unlist(strsplit(header, ","))

数据读取不正确，请注意；有 12 列，而您的 header 长度为 11。您需要修复它。在这个例子中，我刚刚删除了最后一列

data <- read.table('elist.txt')
data <- data[1:11]
names(data) <- header
head(data)

在 R 中修改文本文档后创建相同的格式

creating identical formating after text document modification in R

format

document

r