R:如何根据前一天的信息更改证券交易所每日指数时间序列中的缺口(节假日)?
R: How do I change gaps (holidays) in a time series of a daily index of the stock exchange by the previous day's information?
我正在使用 R 语言并处理来自不同国家/地区的时间序列每日股票指数。为了比较不同的指标(如相关性、因果关系等),我需要所有系列的行数相同,但由于不同国家/地区的假期不同,每个系列的行数都会发生变化。
我正在处理从 yahoo finance 提取的文件,格式为 .csv,例如...
> head(sp)
> Date Open High Low Close Volume Adj.Close
>1288 2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000 1132.99
>1287 2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000 1136.52
>1286 2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
我需要... 例如,假设 2010-01-07 日是假期,在这种情况下,文件中的下一行(第 1285 行)是 2010-01-08 日:
> head(sp)
> Date Open High Low Close Volume Adj.Close
>1288 2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000 1132.99
>1287 2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000 1136.52
>1286 2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
>1285 2010-01-08 1140.52 1145.39 1136.22 1144.98 4389590000 1144.98
需要用前一天的数据填补 2010-01-07 的空白,例如:
> head(sp)
> Date Open High Low Close Volume Adj.Close
>1288 2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000 1132.99
>1287 2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000 1136.52
>1286 2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
>1285 2010-01-07 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
>1284 2010-01-08 1140.52 1145.39 1136.22 1144.98 4389590000 1144.98
我该怎么做???
我的代码是(查看我尝试使用的所有库来解决我的问题 kkk)
>library(PerformanceAnalytics)
>library(tseries)
>library(urca)
>library(zoo)
>library(lmtest)
>library(timeDate)
>library(timeSeries)
>setwd("C:/Users/Fatima/Documents/R")
>sp = read.csv("SP500.csv", header = TRUE, stringsAsFactors = FALSE)
>sp$Date = as.Date(sp$Date)
>sp = sp[order(sp$Date), ]
对不起我的英语不好
xts 包在这里很有用:
DF <- read.table(text = " Date Open High Low Close Volume Adj.Close
1288 2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000 1132.99
1287 2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000 1136.52
1286 2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
1285 2010-01-08 1140.52 1145.39 1136.22 1144.98 4389590000 1144.98", header = TRUE)
DF$Date <- as.Date(DF$Date)
library(xts)
X <- as.xts(DF[,-1], order.by = DF$Date)
na.locf(merge(X, seq(min(DF$Date), max(DF$Date), by = 1)))
# Open High Low Close Volume Adj.Close
#2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000 1132.99
#2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000 1136.52
#2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
#2010-01-07 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
#2010-01-08 1140.52 1145.39 1136.22 1144.98 4389590000 1144.98
编辑:
回应您的评论:您可以像这样排除周末:
dates <- seq(min(DF$Date), max(DF$Date), by = 1)
#you might have to adjust the following to the translations in your locale
dates <- dates[!(weekdays(dates) %in% c("Saturday", "Sunday"))]
na.locf(merge(X, dates))
使用 read.zoo
阅读它,通过将零宽度动物园系列与所有日期合并来添加缺失的日期。最后用na.locf
填入合并生成的NA
值。
Lines <- "Date,Open,High,Low,Close,Volume,Adj.Close
2010-01-04,1116.56,1133.87,1116.56,1132.99,3991400000,1132.99
2010-01-05,1132.66,1136.63,1129.66,1136.52,2491020000,1136.52
2010-01-06,1135.71,1139.19,1133.95,1137.14,4972660000,1137.14
2010-01-11,1140.52,1145.39,1136.22,1144.98,4389590000,1144.98"
library(zoo)
z <- read.zoo(text = Lines, header = TRUE, sep = ",")
zout <- na.locf( merge(z, zoo(, seq(start(z), end(z), by = "day"))) )
给予:
> zout
Open High Low Close Volume Adj.Close
2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000 1132.99
2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000 1136.52
2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
2010-01-07 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
2010-01-08 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
2010-01-09 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
2010-01-10 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
2010-01-11 1140.52 1145.39 1136.22 1144.98 4389590000 1144.98
na.locf
行的替代方法是将 na.approx
与 method = "constant"
一起使用:
na.approx(z, xout = seq(start(z), end(z), by = "day"), method = "constant")
给出相同的答案。
添加到NA
周末外出:
library(chron)
zout[is.weekend(time(zout)), ] <- NA
或 return 仅限工作日:
library(chron)
zout[!is.weekend(time(zout))]
我正在使用 R 语言并处理来自不同国家/地区的时间序列每日股票指数。为了比较不同的指标(如相关性、因果关系等),我需要所有系列的行数相同,但由于不同国家/地区的假期不同,每个系列的行数都会发生变化。
我正在处理从 yahoo finance 提取的文件,格式为 .csv,例如...
> head(sp)
> Date Open High Low Close Volume Adj.Close
>1288 2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000 1132.99
>1287 2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000 1136.52
>1286 2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
我需要... 例如,假设 2010-01-07 日是假期,在这种情况下,文件中的下一行(第 1285 行)是 2010-01-08 日:
> head(sp)
> Date Open High Low Close Volume Adj.Close
>1288 2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000 1132.99
>1287 2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000 1136.52
>1286 2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
>1285 2010-01-08 1140.52 1145.39 1136.22 1144.98 4389590000 1144.98
需要用前一天的数据填补 2010-01-07 的空白,例如:
> head(sp)
> Date Open High Low Close Volume Adj.Close
>1288 2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000 1132.99
>1287 2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000 1136.52
>1286 2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
>1285 2010-01-07 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
>1284 2010-01-08 1140.52 1145.39 1136.22 1144.98 4389590000 1144.98
我该怎么做???
我的代码是(查看我尝试使用的所有库来解决我的问题 kkk)
>library(PerformanceAnalytics)
>library(tseries)
>library(urca)
>library(zoo)
>library(lmtest)
>library(timeDate)
>library(timeSeries)
>setwd("C:/Users/Fatima/Documents/R")
>sp = read.csv("SP500.csv", header = TRUE, stringsAsFactors = FALSE)
>sp$Date = as.Date(sp$Date)
>sp = sp[order(sp$Date), ]
对不起我的英语不好
xts 包在这里很有用:
DF <- read.table(text = " Date Open High Low Close Volume Adj.Close
1288 2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000 1132.99
1287 2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000 1136.52
1286 2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
1285 2010-01-08 1140.52 1145.39 1136.22 1144.98 4389590000 1144.98", header = TRUE)
DF$Date <- as.Date(DF$Date)
library(xts)
X <- as.xts(DF[,-1], order.by = DF$Date)
na.locf(merge(X, seq(min(DF$Date), max(DF$Date), by = 1)))
# Open High Low Close Volume Adj.Close
#2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000 1132.99
#2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000 1136.52
#2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
#2010-01-07 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
#2010-01-08 1140.52 1145.39 1136.22 1144.98 4389590000 1144.98
编辑:
回应您的评论:您可以像这样排除周末:
dates <- seq(min(DF$Date), max(DF$Date), by = 1)
#you might have to adjust the following to the translations in your locale
dates <- dates[!(weekdays(dates) %in% c("Saturday", "Sunday"))]
na.locf(merge(X, dates))
使用 read.zoo
阅读它,通过将零宽度动物园系列与所有日期合并来添加缺失的日期。最后用na.locf
填入合并生成的NA
值。
Lines <- "Date,Open,High,Low,Close,Volume,Adj.Close
2010-01-04,1116.56,1133.87,1116.56,1132.99,3991400000,1132.99
2010-01-05,1132.66,1136.63,1129.66,1136.52,2491020000,1136.52
2010-01-06,1135.71,1139.19,1133.95,1137.14,4972660000,1137.14
2010-01-11,1140.52,1145.39,1136.22,1144.98,4389590000,1144.98"
library(zoo)
z <- read.zoo(text = Lines, header = TRUE, sep = ",")
zout <- na.locf( merge(z, zoo(, seq(start(z), end(z), by = "day"))) )
给予:
> zout
Open High Low Close Volume Adj.Close
2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000 1132.99
2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000 1136.52
2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
2010-01-07 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
2010-01-08 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
2010-01-09 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
2010-01-10 1135.71 1139.19 1133.95 1137.14 4972660000 1137.14
2010-01-11 1140.52 1145.39 1136.22 1144.98 4389590000 1144.98
na.locf
行的替代方法是将 na.approx
与 method = "constant"
一起使用:
na.approx(z, xout = seq(start(z), end(z), by = "day"), method = "constant")
给出相同的答案。
添加到NA
周末外出:
library(chron)
zout[is.weekend(time(zout)), ] <- NA
或 return 仅限工作日:
library(chron)
zout[!is.weekend(time(zout))]