使用 zoo 在 R 中读取 CSV
Reading CSV in R with zoo
我有以下格式的 CSV:
TICKER,PER,DATE,TIME,CLOSE
SYMBOL,1,20160104,1002,14180.0000000
SYMBOL,1,20160104,1003,14241.0000000
我想把它读成时间序列:
f <- function(a, b) {
c <- paste(a, b)
return(strptime(c, format = "%Y%m%d %H%M"))
}
d <- read.zoo("test.csv", FUN = f, index.column = list("DATE", "TIME"))
而我得到的是index does not match data
。为什么?
您需要指定 header = TRUE
和 sep = ","
,因为它们不像 read.csv
.
那样是 read.zoo
的默认值
d <- read.zoo(text="TICKER,PER,DATE,TIME,CLOSE
SYMBOL,1,20160104,1002,14180.0000000
SYMBOL,1,20160104,1003,14241.0000000",
FUN = f, index.column = list("DATE", "TIME"),
header=TRUE, sep=",")
d
# TICKER PER CLOSE
# 2016-01-04 10:02:00 SYMBOL 1 14180
# 2016-01-04 10:03:00 SYMBOL 1 14241
字符和数字列不能同时是时间序列数据的一部分,因为动物园的数据部分 object 是一个矩阵(矩阵必须全是数字、全是字符或所有其他类型);但是,可以使用 split=
在字符列上拆分为宽格式。此外,我们还可以通过指定 format=
和 tz=
来避免必须指定函数 f
。此外,我们必须指定存在 header (header=
) 并且字段以“,”字符分隔 (sep=
)。
(下面我们使用 text = Lines
来实现可重复性,但实际上将其替换为 "test.csv"
。)
Lines <- "TICKER,PER,DATE,TIME,CLOSE
SYMBOL,1,20160104,1002,14180.0000000
SYMBOL,1,20160104,1003,14241.0000000"
library(zoo)
read.zoo(text = Lines, header = TRUE, sep = ",", index = c("DATE", "TIME"),
split = "TICKER", format = "%Y%m%d %H%M", tz = "")
给予:
PER CLOSE
2016-01-04 10:02:00 1 14180
2016-01-04 10:03:00 1 14241
注意: 如果你确实想使用你的函数 f
那么忽略 format
和 tz
并使用:
read.zoo(text = Lines, header = TRUE, sep = ",", index = c("DATE", "TIME"),
split = "TICKER", FUN = f)
这也可行,即将其读入数据框,然后将数据框读入动物园 object:
DF <- read.csv(text = Lines) # read.csv defaults to header=TRUE, sep=","
read.zoo(DF, index = c("DATE", "TIME"), split = "TICKER", FUN = f)
我有以下格式的 CSV:
TICKER,PER,DATE,TIME,CLOSE
SYMBOL,1,20160104,1002,14180.0000000
SYMBOL,1,20160104,1003,14241.0000000
我想把它读成时间序列:
f <- function(a, b) {
c <- paste(a, b)
return(strptime(c, format = "%Y%m%d %H%M"))
}
d <- read.zoo("test.csv", FUN = f, index.column = list("DATE", "TIME"))
而我得到的是index does not match data
。为什么?
您需要指定 header = TRUE
和 sep = ","
,因为它们不像 read.csv
.
read.zoo
的默认值
d <- read.zoo(text="TICKER,PER,DATE,TIME,CLOSE
SYMBOL,1,20160104,1002,14180.0000000
SYMBOL,1,20160104,1003,14241.0000000",
FUN = f, index.column = list("DATE", "TIME"),
header=TRUE, sep=",")
d
# TICKER PER CLOSE
# 2016-01-04 10:02:00 SYMBOL 1 14180
# 2016-01-04 10:03:00 SYMBOL 1 14241
字符和数字列不能同时是时间序列数据的一部分,因为动物园的数据部分 object 是一个矩阵(矩阵必须全是数字、全是字符或所有其他类型);但是,可以使用 split=
在字符列上拆分为宽格式。此外,我们还可以通过指定 format=
和 tz=
来避免必须指定函数 f
。此外,我们必须指定存在 header (header=
) 并且字段以“,”字符分隔 (sep=
)。
(下面我们使用 text = Lines
来实现可重复性,但实际上将其替换为 "test.csv"
。)
Lines <- "TICKER,PER,DATE,TIME,CLOSE
SYMBOL,1,20160104,1002,14180.0000000
SYMBOL,1,20160104,1003,14241.0000000"
library(zoo)
read.zoo(text = Lines, header = TRUE, sep = ",", index = c("DATE", "TIME"),
split = "TICKER", format = "%Y%m%d %H%M", tz = "")
给予:
PER CLOSE
2016-01-04 10:02:00 1 14180
2016-01-04 10:03:00 1 14241
注意: 如果你确实想使用你的函数 f
那么忽略 format
和 tz
并使用:
read.zoo(text = Lines, header = TRUE, sep = ",", index = c("DATE", "TIME"),
split = "TICKER", FUN = f)
这也可行,即将其读入数据框,然后将数据框读入动物园 object:
DF <- read.csv(text = Lines) # read.csv defaults to header=TRUE, sep=","
read.zoo(DF, index = c("DATE", "TIME"), split = "TICKER", FUN = f)