在保持 header 的同时导入文本文件的特定行

Importing specific rows of a text file while keeping header

我有一个如下所示的文本文件:

"Saved at:19 January 2015, 1:01 PM"
"Course"    "Time"  
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:28 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:27 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:26 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:25 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 6:57 PM"
The DI Module Exam contains 16 mul..."
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 6:57 PM"

在使用 read.delim 的地方,我指定了 skip=1,然后将第二行用作 header。有时,在导入过程中应该跳过第 11 行之类的行(可以是其他行)。如果有办法,特别是在 R 基础中,我想

  1. 跳过第一行,
  2. 将第二行设为 header 和
  3. 跳过不以 "EDPY 301 (SEM J4 Wi14)" 开头的行。

仅供参考,这是我用来导入文本文件的代码:

read.delim("path to the file",header=T,stringsAsFactors=FALSE,strip.white=TRUE,na.strings=c("NA",""),skip=1)

谢谢,

我不知道有什么方法可以有条件地排除带有 read.table 的行,但是使用 readLines 读取并使用 grep 或 grepl 创建包含向量似乎很有效:

Lines <- readLines(textConnection('"Saved at:19 January 2015, 1:01 PM"
"Course"    "Time"  
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:28 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:27 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:26 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:25 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 6:57 PM"
The DI Module Exam contains 16 mul..."
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)"    "28 January 2014, 6:57 PM"'))

good <- grep("^\\"EDPY", Lines)
inp <- read.table(text=Lines[good], col.names = c("Course","Time" ))

模式字符串在行起始标记后需要有三个斜杠,两个斜杠构成斜杠,第三个用于转义双引号。