R:从多个 .csv 到 xts 中的单个时间序列
R: from a number of .csv to a single time series in xts
我在当前目录中有 100 多个 csv 文件,所有文件都具有相同的特征。一些例子:
ABC.csv
,close,high,low,open,time,volumefrom,volumeto,timestamp
0,0.05,0.05,0.05,0.05,1405555200,100.0,5.0,2014-07-17 02:00:00
1,0.032,0.05,0.032,0.05,1405641600,500.0,16.0,2014-07-18 02:00:00
2,0.042,0.05,0.026,0.032,1405728000,12600.0,599.6,2014-07-19 02:00:00
...
1265,0.6334,0.6627,0.6054,0.6266,1514851200,6101389.25,3862059.89,2018-01-02 01:00:00
XYZ.csv
,close,high,low,open,time,volumefrom,volumeto,timestamp
0,0.0003616,0.0003616,0.0003616,0.0003616,1412640000,11.21,0.004054,2014-10-07 02:00:00
...
1183,0.0003614,0.0003614,0.0003614,0.0003614,1514851200,0.0,0.0,2018-01-02 01:00:00
我的想法是在 R 中构建一个 xts 中的时间序列数据集,这样我就可以使用 PerformanceAnalytics
和 quantmod
库。类似的东西:
## ABC XYZ ... ... JKL
## 2006-01-03 NaN 20.94342
## 2006-01-04 NaN 21.04486
## 2006-01-05 9.728111 21.06047
## 2006-01-06 9.979226 20.99804
## 2006-01-09 9.946529 20.95903
## 2006-01-10 10.575626 21.06827
## ...
有什么想法吗?如果需要,我可以提供我的试用版。
使用base R
的解决方案
如果您知道文件的格式相同,则可以合并它们。以下是我会做的。
获取文件列表(假设所有 .csv
文件都是您实际需要的文件,并且它们都放在工作目录中)
vcfl <- list.files(pattern = "*.csv")
lapply()
打开所有文件并存储它们 as.data.frame:
lsdf <- lapply(lsfl, read.csv)
合并它们。在这里,我使用了 high
列,但您可以对任何变量应用相同的代码(可能有一个没有循环的解决方案)
out_high <- lsdf[[1]][,c("timestamp", "high")]
for (i in 2:length(vcfl)) {
out_high <- merge(out_high, lsdf[[i]][,c("timestamp", "high")], by = "timestamp")
}
使用文件名向量重命名列:
names(lsdf)[2:length(vcfl)] <- gsub(vcfl, pattern = ".csv", replacement = "")
您现在可以使用 xts
包中的 as.xts()
https://cran.r-project.org/web/packages/xts/xts.pdf
我想有一个使用 tidyverse
的替代解决方案,还有其他人吗?
希望这对您有所帮助。
我在当前目录中有 100 多个 csv 文件,所有文件都具有相同的特征。一些例子:
ABC.csv
,close,high,low,open,time,volumefrom,volumeto,timestamp
0,0.05,0.05,0.05,0.05,1405555200,100.0,5.0,2014-07-17 02:00:00
1,0.032,0.05,0.032,0.05,1405641600,500.0,16.0,2014-07-18 02:00:00
2,0.042,0.05,0.026,0.032,1405728000,12600.0,599.6,2014-07-19 02:00:00
...
1265,0.6334,0.6627,0.6054,0.6266,1514851200,6101389.25,3862059.89,2018-01-02 01:00:00
XYZ.csv
,close,high,low,open,time,volumefrom,volumeto,timestamp
0,0.0003616,0.0003616,0.0003616,0.0003616,1412640000,11.21,0.004054,2014-10-07 02:00:00
...
1183,0.0003614,0.0003614,0.0003614,0.0003614,1514851200,0.0,0.0,2018-01-02 01:00:00
我的想法是在 R 中构建一个 xts 中的时间序列数据集,这样我就可以使用 PerformanceAnalytics
和 quantmod
库。类似的东西:
## ABC XYZ ... ... JKL
## 2006-01-03 NaN 20.94342
## 2006-01-04 NaN 21.04486
## 2006-01-05 9.728111 21.06047
## 2006-01-06 9.979226 20.99804
## 2006-01-09 9.946529 20.95903
## 2006-01-10 10.575626 21.06827
## ...
有什么想法吗?如果需要,我可以提供我的试用版。
使用base R
的解决方案
如果您知道文件的格式相同,则可以合并它们。以下是我会做的。
获取文件列表(假设所有 .csv
文件都是您实际需要的文件,并且它们都放在工作目录中)
vcfl <- list.files(pattern = "*.csv")
lapply()
打开所有文件并存储它们 as.data.frame:
lsdf <- lapply(lsfl, read.csv)
合并它们。在这里,我使用了 high
列,但您可以对任何变量应用相同的代码(可能有一个没有循环的解决方案)
out_high <- lsdf[[1]][,c("timestamp", "high")]
for (i in 2:length(vcfl)) {
out_high <- merge(out_high, lsdf[[i]][,c("timestamp", "high")], by = "timestamp")
}
使用文件名向量重命名列:
names(lsdf)[2:length(vcfl)] <- gsub(vcfl, pattern = ".csv", replacement = "")
您现在可以使用 xts
包中的 as.xts()
https://cran.r-project.org/web/packages/xts/xts.pdf
我想有一个使用 tidyverse
的替代解决方案,还有其他人吗?
希望这对您有所帮助。