提取日期和值以创建数据框
Extract the date and values to create a data frame
这是我文件夹中的三个示例数据文件(文本文件)。我试图提取与每个数据文件对应的日期,并为所有三个文件创建一个数据框。
以下是文件:
文件 1:
### DEFAULTS ###
### OPTIONS ###
Title 2 left $ Date and time: 2010-02-03 12:00 UTC
### DATA ###
250 23.54300 0.00000 12.90000
500 17.47400 3.50000 21.70000
750 41.33200 22.10000 30.40000
文件 2:
### DEFAULTS ###
### OPTIONS ###
Title 2 left $ Date and time: 2010-02-10 12:00 UTC
### DATA ###
250 36.95300 30.60000 27.10000
500 40.87700 37.80000 27.80000
750 41.46100 38.30000 30.70000
文件 3:
### DEFAULTS ###
### OPTIONS ###
Title 2 left $ Date and time: 2010-02-17 12:00 UTC
### DATA ###
250 28.91200 24.90000 5.00000
500 49.82900 32.40000 11.10000
750 50.83600 40.60000 22.20000
下面是我为提取日期编写的代码。
### Start R Code for extracting date
setwd("C:/Users/")
path = "~C:/Users"
file.names<- dir("C:/Users/", pattern =".dat")
file.names
fileNameVector <- NULL
dateNameVector <- NULL
for(i in 1:length(file.names)){
x<-readLines(file.names[i])
x
date.list<-x[3]
date.list
xx<-strsplit(date.list,' ')
xx
date.fin<-unlist(xx)[13]
fileNameVector <- rbind(fileNameVector, file.names[i])
dateNameVector <- rbind(dateNameVector, date.fin)
}
finalDateName <- cbind(fileNameVector, dateNameVector)
finalDateName
并提取值:
setwd("C:/Users")
path = "~C:/Users"
st021ozone<-""
file.names<- dir("C:/Users", pattern =".dat")
for(i in 1:length(file.names)){
file <- read.table(file.names[i],skip=4,header=FALSE, sep=" ", stringsAsFactors=FALSE)
st021ozone <- rbind(st021ozone, file)
}
write.table(st021ozone, file = "st021ozone",sep=",",
row.names = FALSE, qmethod = "double",fileEncoding="windows-1252")
我想加入这两个代码并制作一个完整的数据框,其中日期位于 table 的左侧,对应于每个文件中的 250 值。最终的数据框将有 9 行和 5 列。
这是想要的结果:
2010-02-03 250 23.54300 0.00000 12.90000
500 17.47400 3.50000 21.70000
750 41.33200 22.10000 30.40000
2010-02-10 250 36.95300 30.60000 27.10000
500 40.87700 37.80000 27.80000
750 41.46100 38.30000 30.70000
2010-02-17 250 28.91200 24.90000 5.00000
500 49.82900 32.40000 11.10000
750 50.83600 40.60000 22.20000
提前感谢您的帮助。
编辑:这是该文件的一个示例。我有 53 个类似的文件:
$
### DEFAULTS ###
Color scale $
Contour levels $
Contour colors $
Contour type $
Map limits $
Map projection $
### OPTIONS ###
Image order $ 1
Title 1 left $ Case 0236-004 - Vertical
Vertical
Title 2 left $ Date and time: 2010-09-08
Receptor code: Town021
Title 3 left $
Title 4 left $
Title 1 right $
Title 2 right $ tranport
Title 3 right $ Start: 2010-01-01 00:00 UTC
Title 4 right $
gar-0236-004-01-20151103085553.png
### DATA ###
1 1 0 1 1.83400 32.00000 0.00000 21.20000
1 1 100 1 3.27300 0.00000 25.10000 21.90000
1 1 250 1 12.22200 0.00000 34.30000 25.60000
1 1 500 1 27.27400 0.00000 35.00000 31.30000
1 1 750 1 26.45300 0.00000 36.10000 35.90000
1 1 1000 1 32.62200 0.00000 36.40000 39.30000
1 1 1500 1 35.22700 0.00000 36.70000 42.20000
1 1 2000 1 37.90300 0.00000 37.40000 43.80000
1 1 3000 1 47.28200 0.00000 44.30000 46.90000
1 1 4000 1 51.01200 0.00000 49.00000 49.90000
1 1 5000 1 61.06500 0.00000 49.40000 51.00000
@alistaire 谢谢你 link.Here 是我试图执行的代码。
setwd("C:/Users/")
path = "~C:/Users/"
test.sample<-""
files <- lapply(list.files(pattern = '\.dat'), readLines)
test.sample<- rbind(test.sample, files)
do.call(rbind, lapply(files, function(lines){
# for each file, return a data.frame of the datetime, pulled with regex
data.frame(datetime = as.POSIXct(sub('^.*Date and time: ', '', lines[grep('Date and time:', lines)])),
# and the data, read in as text
read.table(text = lines[(grep('DATA', lines) + 1):length(lines)]))
}))
write.table(test.sample, file = "test.sample", sep="\t", qmethod ="double",row.names = FALSE,fileEncoding="windows-1252")
我给了一个文件名,给了一个路径。当我将变量名 'test.sample' 写入控制台时,我得到了一个矩阵:see image for the result
我想我错过了什么???注意。这里我使用了一个包含 44 个文件的文件夹。
这些案例主要是乱七八糟的黑客攻击问题,尽管有些模式可以让生活更轻松。由于这些文件至少是常规的,因此您在这里不需要太多就可以获得有用的东西:
# read files into a list
files <- lapply(paste0('File', 1:3, '.dat'), readLines)
# loop across files (into list) with `lapply`, recombine list with `do.call(rbind, ...`
do.call(rbind, lapply(files, function(lines){
# for each file, return a data.frame of the datetime, pulled with regex
data.frame(datetime = as.POSIXct(sub('^.*Date and time: ', '', lines[3])),
# and the data, read in as text
read.table(text = lines[5:length(lines)]))
}))
# datetime V1 V2 V3 V4
# 1 2010-02-03 12:00:00 250 23.543 0.0 12.9
# 2 2010-02-03 12:00:00 500 17.474 3.5 21.7
# 3 2010-02-03 12:00:00 750 41.332 22.1 30.4
# 4 2010-02-10 12:00:00 250 36.953 30.6 27.1
# 5 2010-02-10 12:00:00 500 40.877 37.8 27.8
# 6 2010-02-10 12:00:00 750 41.461 38.3 30.7
# 7 2010-02-17 12:00:00 250 28.912 24.9 5.0
# 8 2010-02-17 12:00:00 500 49.829 32.4 11.1
# 9 2010-02-17 12:00:00 750 50.836 40.6 22.2
根据需要进行编辑。
编辑
对于新的文件结构,现在更独立于行号:
do.call(rbind, lapply(files, function(lines){
# for each file, return a data.frame of the datetime, pulled with regex
data.frame(datetime = as.POSIXct(sub('^.*Date and time: ', '', lines[grep('Date and time:', lines)])),
# and the data, read in as text
read.table(text = lines[(grep('DATA', lines) + 1):length(lines)]))
}))
# datetime V1 V2 V3 V4 V5 V6 V7 V8
# 1 2010-09-08 1 1 0 1 1.834 32 0.0 21.2
# 2 2010-09-08 1 1 100 1 3.273 0 25.1 21.9
# 3 2010-09-08 1 1 250 1 12.222 0 34.3 25.6
# 4 2010-09-08 1 1 500 1 27.274 0 35.0 31.3
# 5 2010-09-08 1 1 750 1 26.453 0 36.1 35.9
# 6 2010-09-08 1 1 1000 1 32.622 0 36.4 39.3
# 7 2010-09-08 1 1 1500 1 35.227 0 36.7 42.2
# 8 2010-09-08 1 1 2000 1 37.903 0 37.4 43.8
# 9 2010-09-08 1 1 3000 1 47.282 0 44.3 46.9
# 10 2010-09-08 1 1 4000 1 51.012 0 49.0 49.9
# 11 2010-09-08 1 1 5000 1 61.065 0 49.4 51.0
这是我文件夹中的三个示例数据文件(文本文件)。我试图提取与每个数据文件对应的日期,并为所有三个文件创建一个数据框。 以下是文件:
文件 1:
### DEFAULTS ###
### OPTIONS ###
Title 2 left $ Date and time: 2010-02-03 12:00 UTC
### DATA ###
250 23.54300 0.00000 12.90000
500 17.47400 3.50000 21.70000
750 41.33200 22.10000 30.40000
文件 2:
### DEFAULTS ###
### OPTIONS ###
Title 2 left $ Date and time: 2010-02-10 12:00 UTC
### DATA ###
250 36.95300 30.60000 27.10000
500 40.87700 37.80000 27.80000
750 41.46100 38.30000 30.70000
文件 3:
### DEFAULTS ###
### OPTIONS ###
Title 2 left $ Date and time: 2010-02-17 12:00 UTC
### DATA ###
250 28.91200 24.90000 5.00000
500 49.82900 32.40000 11.10000
750 50.83600 40.60000 22.20000
下面是我为提取日期编写的代码。
### Start R Code for extracting date
setwd("C:/Users/")
path = "~C:/Users"
file.names<- dir("C:/Users/", pattern =".dat")
file.names
fileNameVector <- NULL
dateNameVector <- NULL
for(i in 1:length(file.names)){
x<-readLines(file.names[i])
x
date.list<-x[3]
date.list
xx<-strsplit(date.list,' ')
xx
date.fin<-unlist(xx)[13]
fileNameVector <- rbind(fileNameVector, file.names[i])
dateNameVector <- rbind(dateNameVector, date.fin)
}
finalDateName <- cbind(fileNameVector, dateNameVector)
finalDateName
并提取值:
setwd("C:/Users")
path = "~C:/Users"
st021ozone<-""
file.names<- dir("C:/Users", pattern =".dat")
for(i in 1:length(file.names)){
file <- read.table(file.names[i],skip=4,header=FALSE, sep=" ", stringsAsFactors=FALSE)
st021ozone <- rbind(st021ozone, file)
}
write.table(st021ozone, file = "st021ozone",sep=",",
row.names = FALSE, qmethod = "double",fileEncoding="windows-1252")
我想加入这两个代码并制作一个完整的数据框,其中日期位于 table 的左侧,对应于每个文件中的 250 值。最终的数据框将有 9 行和 5 列。
这是想要的结果:
2010-02-03 250 23.54300 0.00000 12.90000
500 17.47400 3.50000 21.70000
750 41.33200 22.10000 30.40000
2010-02-10 250 36.95300 30.60000 27.10000
500 40.87700 37.80000 27.80000
750 41.46100 38.30000 30.70000
2010-02-17 250 28.91200 24.90000 5.00000
500 49.82900 32.40000 11.10000
750 50.83600 40.60000 22.20000
提前感谢您的帮助。
编辑:这是该文件的一个示例。我有 53 个类似的文件:
$
### DEFAULTS ###
Color scale $
Contour levels $
Contour colors $
Contour type $
Map limits $
Map projection $
### OPTIONS ###
Image order $ 1
Title 1 left $ Case 0236-004 - Vertical
Vertical
Title 2 left $ Date and time: 2010-09-08
Receptor code: Town021
Title 3 left $
Title 4 left $
Title 1 right $
Title 2 right $ tranport
Title 3 right $ Start: 2010-01-01 00:00 UTC
Title 4 right $
gar-0236-004-01-20151103085553.png
### DATA ###
1 1 0 1 1.83400 32.00000 0.00000 21.20000
1 1 100 1 3.27300 0.00000 25.10000 21.90000
1 1 250 1 12.22200 0.00000 34.30000 25.60000
1 1 500 1 27.27400 0.00000 35.00000 31.30000
1 1 750 1 26.45300 0.00000 36.10000 35.90000
1 1 1000 1 32.62200 0.00000 36.40000 39.30000
1 1 1500 1 35.22700 0.00000 36.70000 42.20000
1 1 2000 1 37.90300 0.00000 37.40000 43.80000
1 1 3000 1 47.28200 0.00000 44.30000 46.90000
1 1 4000 1 51.01200 0.00000 49.00000 49.90000
1 1 5000 1 61.06500 0.00000 49.40000 51.00000
@alistaire 谢谢你 link.Here 是我试图执行的代码。
setwd("C:/Users/")
path = "~C:/Users/"
test.sample<-""
files <- lapply(list.files(pattern = '\.dat'), readLines)
test.sample<- rbind(test.sample, files)
do.call(rbind, lapply(files, function(lines){
# for each file, return a data.frame of the datetime, pulled with regex
data.frame(datetime = as.POSIXct(sub('^.*Date and time: ', '', lines[grep('Date and time:', lines)])),
# and the data, read in as text
read.table(text = lines[(grep('DATA', lines) + 1):length(lines)]))
}))
write.table(test.sample, file = "test.sample", sep="\t", qmethod ="double",row.names = FALSE,fileEncoding="windows-1252")
我给了一个文件名,给了一个路径。当我将变量名 'test.sample' 写入控制台时,我得到了一个矩阵:see image for the result 我想我错过了什么???注意。这里我使用了一个包含 44 个文件的文件夹。
这些案例主要是乱七八糟的黑客攻击问题,尽管有些模式可以让生活更轻松。由于这些文件至少是常规的,因此您在这里不需要太多就可以获得有用的东西:
# read files into a list
files <- lapply(paste0('File', 1:3, '.dat'), readLines)
# loop across files (into list) with `lapply`, recombine list with `do.call(rbind, ...`
do.call(rbind, lapply(files, function(lines){
# for each file, return a data.frame of the datetime, pulled with regex
data.frame(datetime = as.POSIXct(sub('^.*Date and time: ', '', lines[3])),
# and the data, read in as text
read.table(text = lines[5:length(lines)]))
}))
# datetime V1 V2 V3 V4
# 1 2010-02-03 12:00:00 250 23.543 0.0 12.9
# 2 2010-02-03 12:00:00 500 17.474 3.5 21.7
# 3 2010-02-03 12:00:00 750 41.332 22.1 30.4
# 4 2010-02-10 12:00:00 250 36.953 30.6 27.1
# 5 2010-02-10 12:00:00 500 40.877 37.8 27.8
# 6 2010-02-10 12:00:00 750 41.461 38.3 30.7
# 7 2010-02-17 12:00:00 250 28.912 24.9 5.0
# 8 2010-02-17 12:00:00 500 49.829 32.4 11.1
# 9 2010-02-17 12:00:00 750 50.836 40.6 22.2
根据需要进行编辑。
编辑
对于新的文件结构,现在更独立于行号:
do.call(rbind, lapply(files, function(lines){
# for each file, return a data.frame of the datetime, pulled with regex
data.frame(datetime = as.POSIXct(sub('^.*Date and time: ', '', lines[grep('Date and time:', lines)])),
# and the data, read in as text
read.table(text = lines[(grep('DATA', lines) + 1):length(lines)]))
}))
# datetime V1 V2 V3 V4 V5 V6 V7 V8
# 1 2010-09-08 1 1 0 1 1.834 32 0.0 21.2
# 2 2010-09-08 1 1 100 1 3.273 0 25.1 21.9
# 3 2010-09-08 1 1 250 1 12.222 0 34.3 25.6
# 4 2010-09-08 1 1 500 1 27.274 0 35.0 31.3
# 5 2010-09-08 1 1 750 1 26.453 0 36.1 35.9
# 6 2010-09-08 1 1 1000 1 32.622 0 36.4 39.3
# 7 2010-09-08 1 1 1500 1 35.227 0 36.7 42.2
# 8 2010-09-08 1 1 2000 1 37.903 0 37.4 43.8
# 9 2010-09-08 1 1 3000 1 47.282 0 44.3 46.9
# 10 2010-09-08 1 1 4000 1 51.012 0 49.0 49.9
# 11 2010-09-08 1 1 5000 1 61.065 0 49.4 51.0