在保持 header 的同时导入文本文件的特定行
Importing specific rows of a text file while keeping header
我有一个如下所示的文本文件:
"Saved at:19 January 2015, 1:01 PM"
"Course" "Time"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:28 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:27 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:26 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:25 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
The DI Module Exam contains 16 mul..."
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
在使用 read.delim
的地方,我指定了 skip=1
,然后将第二行用作 header。有时,在导入过程中应该跳过第 11 行之类的行(可以是其他行)。如果有办法,特别是在 R 基础中,我想
- 跳过第一行,
- 将第二行设为 header 和
- 跳过不以
"EDPY 301 (SEM J4 Wi14)"
开头的行。
仅供参考,这是我用来导入文本文件的代码:
read.delim("path to the file",header=T,stringsAsFactors=FALSE,strip.white=TRUE,na.strings=c("NA",""),skip=1)
谢谢,
我不知道有什么方法可以有条件地排除带有 read.table
的行,但是使用 readLines 读取并使用 grep 或 grepl 创建包含向量似乎很有效:
Lines <- readLines(textConnection('"Saved at:19 January 2015, 1:01 PM"
"Course" "Time"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:28 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:27 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:26 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:25 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
The DI Module Exam contains 16 mul..."
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"'))
good <- grep("^\\"EDPY", Lines)
inp <- read.table(text=Lines[good], col.names = c("Course","Time" ))
模式字符串在行起始标记后需要有三个斜杠,两个斜杠构成斜杠,第三个用于转义双引号。
我有一个如下所示的文本文件:
"Saved at:19 January 2015, 1:01 PM"
"Course" "Time"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:28 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:27 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:26 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:25 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
The DI Module Exam contains 16 mul..."
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
在使用 read.delim
的地方,我指定了 skip=1
,然后将第二行用作 header。有时,在导入过程中应该跳过第 11 行之类的行(可以是其他行)。如果有办法,特别是在 R 基础中,我想
- 跳过第一行,
- 将第二行设为 header 和
- 跳过不以
"EDPY 301 (SEM J4 Wi14)"
开头的行。
仅供参考,这是我用来导入文本文件的代码:
read.delim("path to the file",header=T,stringsAsFactors=FALSE,strip.white=TRUE,na.strings=c("NA",""),skip=1)
谢谢,
我不知道有什么方法可以有条件地排除带有 read.table
的行,但是使用 readLines 读取并使用 grep 或 grepl 创建包含向量似乎很有效:
Lines <- readLines(textConnection('"Saved at:19 January 2015, 1:01 PM"
"Course" "Time"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:28 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:27 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:26 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:25 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 7:02 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
The DI Module Exam contains 16 mul..."
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"
"EDPY 301 (SEM J4 Wi14)" "28 January 2014, 6:57 PM"'))
good <- grep("^\\"EDPY", Lines)
inp <- read.table(text=Lines[good], col.names = c("Course","Time" ))
模式字符串在行起始标记后需要有三个斜杠,两个斜杠构成斜杠,第三个用于转义双引号。