遇到 NA 时重置 cumprod
Reset cumprod when NA is encountered
我有一个包含每月 return 股票的 xts 对象。我想计算股票的滚动累积 return。一些股票在数据中有 NA。每次遇到 NA 时,我希望累积 return 重置为 1。这是一些示例数据:
rets<-read.table(text=
'Date,AFX SJ Equity,DSY SJ Equity
1996-12-31,0.000000000,0.0298516427
1997-01-31,-0.046874751,0.1173840351
1997-02-28,0.088537483,0.0080555362
1997-03-31,-0.003013021,0.2516612299
1997-04-30,-0.003022126,-0.0425537783
1997-05-30,-0.060610279,0.1222167814
1997-06-30,-0.030128416,0.0594070842
1997-07-31,-0.040264811,NA
1997-08-29,0.143354912,NA
1997-09-30,NA,NA
1997-10-31,0.023807612,0.0458311280
1997-11-28,0.011881887,0.1035818306
1997-12-31,0.023445977,-0.0729239783
1998-01-30,-0.064883184,-0.0007773145
1998-02-27,-0.020408576,0.0405326221
1998-03-31,0.124981915,0.1198516418
1998-04-30,0.081499173,-0.0167247568
1998-05-29,-0.143835151,0.1292490014
1998-06-30,-0.189289470,0.1198825615
1998-07-31,-0.130008077,NA
',sep=',',header=TRUE)
library(lubridate)
library(xts)
rets<-xts(rets[,-1],order.by=ymd(rets[,1]))
这是我试过的方法:
cum_ret <- ifelse(is.na(rets)==T, 1, cumprod(1+rets))
给出:
AFX.SJ.Equity DSY.SJ.Equity
[1,] 1.0000000 1.029852
[2,] 0.9531252 1.150740
[3,] 1.0375126 1.160010
[4,] 1.0343865 1.451939
[5,] 1.0312605 1.390154
[6,] 0.9687555 1.560054
[7,] 0.9395684 1.652732
[8,] 0.9017369 1.000000
[9,] 1.0310053 1.000000
[10,] 1.0000000 1.000000
[11,] NA NA
[12,] NA NA
[13,] NA NA
[14,] NA NA
[15,] NA NA
[16,] NA NA
[17,] NA NA
[18,] NA NA
[19,] NA NA
[20,] NA 1.000000
这里是 NA,任何有数据的地方 在遇到第一个 NA 之后 和原始数据中有 NA 的 1。
我想要的输出应该是这样的:
AFX SJ Equity DSY SJ Equity
1996-12-31 1.0000000 1.029852
1997-01-31 0.9531252 1.150740
1997-02-28 1.0375126 1.160010
1997-03-31 1.0343865 1.451939
1997-04-30 1.0312605 1.390154
1997-05-30 0.9687555 1.560054
1997-06-30 0.9395684 1.652732
1997-07-31 0.9017369 NA
1997-08-29 1.0310053 NA
1997-10-31 NA NA
1997-10-31 1.0238076 1.045831
1997-11-28 1.0359724 1.154160
1997-12-31 1.0602618 1.069994
1998-01-30 0.9914686 1.069163
1998-02-27 0.9712341 1.112499
1998-03-31 1.0926208 1.245833
1998-04-30 1.1816685 1.224997
1998-05-29 1.0117031 1.383327
1998-06-30 0.8201983 1.549163
1998-07-31 0.7135659 NA
我周围没有 xts
,但这个过程应该同样有效。 (正因为如此,我使用 lapply
来处理 rets
,你应该能够直接将其适应你的时间序列。)
rets[,-1] <- lapply(rets[,-1], function(ret) {
r <- rle(!is.na(ret))
r2 <- c(0, cumsum(r$lengths))
starts <- 1 + head(r2, n = -1)
ends <- r2[-1]
seqs <- Map(seq, starts[r$values], ends[r$values])
for (s in seqs) {
ret[s] <- cumprod(1 + ret[s])
}
ret
})
rets
# Date AFX.SJ.Equity DSY.SJ.Equity
# 1 1996-12-31 1.0000000 1.029852
# 2 1997-01-31 0.9531252 1.150740
# 3 1997-02-28 1.0375126 1.160010
# 4 1997-03-31 1.0343865 1.451939
# 5 1997-04-30 1.0312605 1.390154
# 6 1997-05-30 0.9687555 1.560054
# 7 1997-06-30 0.9395684 1.652732
# 8 1997-07-31 0.9017369 NA
# 9 1997-08-29 1.0310053 NA
# 10 1997-09-30 NA NA
# 11 1997-10-31 1.0238076 1.045831
# 12 1997-11-28 1.0359724 1.154160
# 13 1997-12-31 1.0602618 1.069994
# 14 1998-01-30 0.9914686 1.069163
# 15 1998-02-27 0.9712341 1.112499
# 16 1998-03-31 1.0926208 1.245833
# 17 1998-04-30 1.1816685 1.224997
# 18 1998-05-29 1.0117031 1.383327
# 19 1998-06-30 0.8201983 1.549163
# 20 1998-07-31 0.7135659 NA
这里的技巧是使用 rle
来确定非 NA
的每个向量的子集(存储在 r
变量中......虽然我不应该使用单字母变量名)。如果我们查看 lapply
内的第一遍,我们会看到
r
# Run Length Encoding
# lengths: int [1:3] 9 1 10
# values : logi [1:3] TRUE FALSE TRUE
seqs
# [[1]]
# [1] 1 2 3 4 5 6 7 8 9
# [[2]]
# [1] 11 12 13 14 15 16 17 18 19 20
我有一个包含每月 return 股票的 xts 对象。我想计算股票的滚动累积 return。一些股票在数据中有 NA。每次遇到 NA 时,我希望累积 return 重置为 1。这是一些示例数据:
rets<-read.table(text=
'Date,AFX SJ Equity,DSY SJ Equity
1996-12-31,0.000000000,0.0298516427
1997-01-31,-0.046874751,0.1173840351
1997-02-28,0.088537483,0.0080555362
1997-03-31,-0.003013021,0.2516612299
1997-04-30,-0.003022126,-0.0425537783
1997-05-30,-0.060610279,0.1222167814
1997-06-30,-0.030128416,0.0594070842
1997-07-31,-0.040264811,NA
1997-08-29,0.143354912,NA
1997-09-30,NA,NA
1997-10-31,0.023807612,0.0458311280
1997-11-28,0.011881887,0.1035818306
1997-12-31,0.023445977,-0.0729239783
1998-01-30,-0.064883184,-0.0007773145
1998-02-27,-0.020408576,0.0405326221
1998-03-31,0.124981915,0.1198516418
1998-04-30,0.081499173,-0.0167247568
1998-05-29,-0.143835151,0.1292490014
1998-06-30,-0.189289470,0.1198825615
1998-07-31,-0.130008077,NA
',sep=',',header=TRUE)
library(lubridate)
library(xts)
rets<-xts(rets[,-1],order.by=ymd(rets[,1]))
这是我试过的方法:
cum_ret <- ifelse(is.na(rets)==T, 1, cumprod(1+rets))
给出:
AFX.SJ.Equity DSY.SJ.Equity
[1,] 1.0000000 1.029852
[2,] 0.9531252 1.150740
[3,] 1.0375126 1.160010
[4,] 1.0343865 1.451939
[5,] 1.0312605 1.390154
[6,] 0.9687555 1.560054
[7,] 0.9395684 1.652732
[8,] 0.9017369 1.000000
[9,] 1.0310053 1.000000
[10,] 1.0000000 1.000000
[11,] NA NA
[12,] NA NA
[13,] NA NA
[14,] NA NA
[15,] NA NA
[16,] NA NA
[17,] NA NA
[18,] NA NA
[19,] NA NA
[20,] NA 1.000000
这里是 NA,任何有数据的地方 在遇到第一个 NA 之后 和原始数据中有 NA 的 1。
我想要的输出应该是这样的:
AFX SJ Equity DSY SJ Equity
1996-12-31 1.0000000 1.029852
1997-01-31 0.9531252 1.150740
1997-02-28 1.0375126 1.160010
1997-03-31 1.0343865 1.451939
1997-04-30 1.0312605 1.390154
1997-05-30 0.9687555 1.560054
1997-06-30 0.9395684 1.652732
1997-07-31 0.9017369 NA
1997-08-29 1.0310053 NA
1997-10-31 NA NA
1997-10-31 1.0238076 1.045831
1997-11-28 1.0359724 1.154160
1997-12-31 1.0602618 1.069994
1998-01-30 0.9914686 1.069163
1998-02-27 0.9712341 1.112499
1998-03-31 1.0926208 1.245833
1998-04-30 1.1816685 1.224997
1998-05-29 1.0117031 1.383327
1998-06-30 0.8201983 1.549163
1998-07-31 0.7135659 NA
我周围没有 xts
,但这个过程应该同样有效。 (正因为如此,我使用 lapply
来处理 rets
,你应该能够直接将其适应你的时间序列。)
rets[,-1] <- lapply(rets[,-1], function(ret) {
r <- rle(!is.na(ret))
r2 <- c(0, cumsum(r$lengths))
starts <- 1 + head(r2, n = -1)
ends <- r2[-1]
seqs <- Map(seq, starts[r$values], ends[r$values])
for (s in seqs) {
ret[s] <- cumprod(1 + ret[s])
}
ret
})
rets
# Date AFX.SJ.Equity DSY.SJ.Equity
# 1 1996-12-31 1.0000000 1.029852
# 2 1997-01-31 0.9531252 1.150740
# 3 1997-02-28 1.0375126 1.160010
# 4 1997-03-31 1.0343865 1.451939
# 5 1997-04-30 1.0312605 1.390154
# 6 1997-05-30 0.9687555 1.560054
# 7 1997-06-30 0.9395684 1.652732
# 8 1997-07-31 0.9017369 NA
# 9 1997-08-29 1.0310053 NA
# 10 1997-09-30 NA NA
# 11 1997-10-31 1.0238076 1.045831
# 12 1997-11-28 1.0359724 1.154160
# 13 1997-12-31 1.0602618 1.069994
# 14 1998-01-30 0.9914686 1.069163
# 15 1998-02-27 0.9712341 1.112499
# 16 1998-03-31 1.0926208 1.245833
# 17 1998-04-30 1.1816685 1.224997
# 18 1998-05-29 1.0117031 1.383327
# 19 1998-06-30 0.8201983 1.549163
# 20 1998-07-31 0.7135659 NA
这里的技巧是使用 rle
来确定非 NA
的每个向量的子集(存储在 r
变量中......虽然我不应该使用单字母变量名)。如果我们查看 lapply
内的第一遍,我们会看到
r
# Run Length Encoding
# lengths: int [1:3] 9 1 10
# values : logi [1:3] TRUE FALSE TRUE
seqs
# [[1]]
# [1] 1 2 3 4 5 6 7 8 9
# [[2]]
# [1] 11 12 13 14 15 16 17 18 19 20