R:将可变数量的行粘贴(或组合)为一个
R: pasting (or combining) a variable amount of rows together as one
我有一个文本文件,我正在尝试解析并将信息放入数据框中。在每个 'events' 中可能有也可能没有一些注释。然而,注释可以跨越不同数量的行。我需要将每个事件的注释连接成一个字符串以存储在数据框的一列中。
ID: 20470
Version: 1
notes:
ID: 01040
Version: 2
notes:
The customer was late.
Project took 20 min. longer than anticipated
Work was successfully completed
ID: 00000
Version: 1
notes:
Customer was not at home.
ID: 00000
Version: 7
notes:
Fax at 2:30 pm
Called but no answer
Visit home no answer
Left note on door with call back number
Made a final attempt on 12/5/2013
closed case on 12/10 with nothing resolved
例如,对于第三个事件,注释应该是一个长字符串:"The customer was late. Project took 20 min. longer than anticipated Work was successfully completed",然后将其存储到数据框中的注释列中。
对于每个事件,我知道注释跨越了多少行。
类似这样的事情(实际上,你会更快乐并且自己弄清楚更多,我只是在两个任务之间拖延):
x <- readLines("R/xample.txt") # you'll probably read it from a file
ids <- grep("^ID:", x) # detecting lines starting with ID:
versions <- grep("^Version:", x)
notes <- grep("^notes:", x)
nStart <- notes + 1 # lines where the notes start
nEnd <- c(ids[-1]-1, length(x)) # notes end one line before the next ID: line
ids <- sapply(strsplit(x[ids], ": "), "[[", 2)
versions <- sapply(strsplit(x[versions], ": "), "[[", 2)
notes <- mapply(function(i,j) paste(x[i:j], collapse=" "), nStart, nEnd)
df <- data.frame(ID=ids, ver=versions, note=notes, stringsAsFactors=FALSE)
数据输入
> dput(x)
c("ID: 20470", "Version: 1", "notes: ", " ", " ", "ID: 01040",
"Version: 2", "notes: ", " The customer was late.", "Project took 20 min. longer than anticipated",
"Work was successfully completed", "", "ID: 00000", "Version: 1",
"notes: ", " Customer was not at home.", "", "ID: 00000", "Version: 7",
"notes: ", " Fax at 2:30 pm", "Called but no answer", "Visit home no answer",
"Left note on door with call back number", "Made a final attempt on 12/5/2013",
"closed case on 12/10 with nothing resolved ")
我有一个文本文件,我正在尝试解析并将信息放入数据框中。在每个 'events' 中可能有也可能没有一些注释。然而,注释可以跨越不同数量的行。我需要将每个事件的注释连接成一个字符串以存储在数据框的一列中。
ID: 20470
Version: 1
notes:
ID: 01040
Version: 2
notes:
The customer was late.
Project took 20 min. longer than anticipated
Work was successfully completed
ID: 00000
Version: 1
notes:
Customer was not at home.
ID: 00000
Version: 7
notes:
Fax at 2:30 pm
Called but no answer
Visit home no answer
Left note on door with call back number
Made a final attempt on 12/5/2013
closed case on 12/10 with nothing resolved
例如,对于第三个事件,注释应该是一个长字符串:"The customer was late. Project took 20 min. longer than anticipated Work was successfully completed",然后将其存储到数据框中的注释列中。
对于每个事件,我知道注释跨越了多少行。
类似这样的事情(实际上,你会更快乐并且自己弄清楚更多,我只是在两个任务之间拖延):
x <- readLines("R/xample.txt") # you'll probably read it from a file
ids <- grep("^ID:", x) # detecting lines starting with ID:
versions <- grep("^Version:", x)
notes <- grep("^notes:", x)
nStart <- notes + 1 # lines where the notes start
nEnd <- c(ids[-1]-1, length(x)) # notes end one line before the next ID: line
ids <- sapply(strsplit(x[ids], ": "), "[[", 2)
versions <- sapply(strsplit(x[versions], ": "), "[[", 2)
notes <- mapply(function(i,j) paste(x[i:j], collapse=" "), nStart, nEnd)
df <- data.frame(ID=ids, ver=versions, note=notes, stringsAsFactors=FALSE)
数据输入
> dput(x)
c("ID: 20470", "Version: 1", "notes: ", " ", " ", "ID: 01040",
"Version: 2", "notes: ", " The customer was late.", "Project took 20 min. longer than anticipated",
"Work was successfully completed", "", "ID: 00000", "Version: 1",
"notes: ", " Customer was not at home.", "", "ID: 00000", "Version: 7",
"notes: ", " Fax at 2:30 pm", "Called but no answer", "Visit home no answer",
"Left note on door with call back number", "Made a final attempt on 12/5/2013",
"closed case on 12/10 with nothing resolved ")