如何在没有来自高频包的 aggregateTrades 的情况下进行先前的报价聚合

How to do previous tick aggregation without aggregateTrades from highfrequency package

我需要每隔 5 分钟对我的报价数据集进行一次报价汇总。请注意我想要做的类似于高频包中的 aggregateTrades() 函数。但是由于其他一些数据处理问题,我需要在不使用高频包的情况下解决这个问题。这是我的数据集:

dput(tt)
structure(c(1371.25, NA, 1373.95, NA, NA, 1373, NA, 1373.95, 
1373.9, NA, NA, 1374, 1374.15, NA, 1374, 1373.85, 1372.55, 1374.05, 
1374.15, 1374.75, NA, NA, 1375.9, 1374.05, NA, NA, NA, NA, NA, 
NA, NA, 1375, NA, NA, NA, NA, NA, 1376.35, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, 1376.25, NA, 1378, 1376.5, NA, NA, NA, 1378, 
1378, NA, NA, 1378.8, 231.9, 231.85, NA, 231.9, 231.85, 231.9, 
231.8, 231.9, 232.6, 231.95, 232.35, 232, 232.1, 232.05, 232.05, 
232.05, 231.5, 231.3, NA, NA, 231.1, 231.1, 231.1, 231, 231, 
230.95, 230.6, 230.6, 230.7, 230.6, 231, NA, 231, 231, 231.45, 
231.65, 231.4, 231.7, 231.3, 231.25, 231.25, 231.4, 231.4, 231.85, 
231.75, 231.5, 231.55, 231.35, NA, 231.5, 231.5, NA, 231.5, 231.25, 
231.15, 231, 231, 231, 231.05, NA), .indexCLASS = c("POSIXct", 
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "Asia/Calcutta", tzone = "Asia/Calcutta", index = structure(c(1459481850, 
1459482301, 1459482302, 1459482303, 1459482304, 1459482305, 1459482306, 
1459482307, 1459482309, 1459482310, 1459482311, 1459482312, 1459482314, 
1459482315, 1459482316, 1459482317, 1459482318, 1459482319, 1459482320, 
1459482321, 1459482322, 1459482323, 1459482324, 1459482326, 1459482328, 
1459482329, 1459482330, 1459482331, 1459482332, 1459482336, 1459482337, 
1459482338, 1459482339, 1459482342, 1459482344, 1459482346, 1459482347, 
1459482348, 1459482349, 1459482350, 1459482351, 1459482354, 1459482355, 
1459482356, 1459482357, 1459482358, 1459482359, 1459482362, 1459482363, 
1459482364, 1459482369, 1459482370, 1459482371, 1459482372, 1459482373, 
1459482378, 1459482379, 1459482380, 1459482382, 1459482388), tzone = "Asia/Calcutta", tclass = c("POSIXct", 
"POSIXt")), .Dim = c(60L, 2L), .Dimnames = list(NULL, c("A", 
"B")), class = c("xts", "zoo"))

这是我之前的分时聚合代码:

ag.5min.tt<-tt%>%filter(as.Date(index(tt)))%>%lapply(aggregate(by=cut(format(index(tt), format = "%H:%M:%S"), breaks = "5 mins", Fun=tail)))

我想用上面的代码做的是每天 为 A 和 B 的价格设置 5 分钟的间隔。但是我遇到了错误。请建议如何修复此错误:

Error in UseMethod("filter_") : 
  no applicable method for 'filter_' applied to an object of class "c('xts', 'zoo')" 

谢谢。

编辑: 将 xts 对象转换为数据帧:

tt<-as.data.frame(tt)
tt<-data.frame(Time=rownames(tt), coredata(tt))
ag.5min.tt<-tt%>% filter(as.Date(index(tt)))%>%lapply(aggregate(by=cut(format(index(tt), format = "%H:%M:%S"), breaks = "5 mins", Fun=tail)))

新错误:

Error in eval(substitute(expr), envir, enclos) : 
  filter condition does not evaluate to a logical vector. 

编辑: 尝试:

tt$Time<- as.POSIXct(tt$Time, format="%Y-%m-%d %H:%M:%S")
ag.5min.tt<-tt%>% group_by(Time==as.Date(tt$Time))%>%lapply(aggregate(by=cut(format(tt$Time, format = "%H:%M:%S"), breaks = "5 mins", Fun=tail)))

错误:

Error in cut.default(format(tt$Time, format = "%H:%M:%S"), breaks = "5 mins",  : 
  'x' must be numeric
In addition: Warning message:
In eval(substitute(expr), envir, enclos) :
  Incompatible methods ("Ops.POSIXt", "Ops.Date") for "=="

结果应该是这样的。每五分钟时间戳将具有该特定时间戳的值,或者如果存在 NA,则该时间戳将具有股票 A 和 B 的最后一个非 NA 值

  time                  A      B
1 2016-04-01 09:00:00      NA    NA
2 2016-04-01 09:05:00      NA    NA
3 2016-04-01 09:10:00      NA    NA
4 2016-04-01 09:15:00 1371.25 231.90
5 2016-04-01 09:20:00 1376.35 231.55

将 xts 对象更改为数据框以与 dplyr 一起使用

library(dplyr)
library(tibble)
library(xts)
library(tidyr)
dtf <- tt %>% 
    as.data.frame() %>%
    # add time information
    rownames_to_column("time") %>%
    mutate(time = as.POSIXct(time))

生成要选择的时间向量 在最小和最大时间之间每 5 分钟(300 秒)

timepick <- seq(trunc(min(dtf$time),"hour"), # start at the hour
                max(dtf$time)+300 , 300)

使用中断向量选择最后一个可用的 每5分钟观察一次。

ag.5min.tt <- dtf %>%
    # Add missing interval
    full_join(data_frame(time = timepick), by = "time") %>%
    arrange(time) %>% # important to arrange by time here
    # Replace each NA with the most recent non-NA
    fill(-time) %>% 
    # take selected values only
    filter(time %in% timepick) 

转换回 xts 对象

ag.5min.tt <- ag.5min.tt %>% 
    as.data.frame() %>% 
    column_to_rownames("time") %>% 
    as.xts()
ag.5min.tt

                          A      B
2016-04-01 09:00:00      NA     NA
2016-04-01 09:05:00      NA     NA
2016-04-01 09:10:00 1371.25 231.90
2016-04-01 09:15:00 1371.25 231.90
2016-04-01 09:20:00 1378.80 231.05

您可以使用 .indexmin 按分钟为您的时间序列编制索引,然后操纵该索引以对观察结果进行子集化:

ind <- which(diff(.indexmin(tt) %% 5) == -4)
res <- tt[ind]

此处,.indexmin(tt) %% 5 将 return 自上次第五分钟以来的分钟数。出于我们的目的,我们想从中提取每个 4 系列的最后一个索引,这是每五分钟前一分钟的最后一次观察。为此,我们可以使用 diff 并仅提取从 40 交叉的索引(导致 diff of -4) 使用 which.

为了说明,我们修改您发布的数据以添加实际满足您的提取条件的观察结果:

tt <- structure(c(1371.25, NA, 1373.95, NA, NA, 1373, NA, 1373.95, 
            1373.9, NA, NA, 1374, 1374.15, NA, 1374, 1373.85, 1372.55, 1374.05, 
            1374.15, 1374.75, NA, NA, 1375.9, 1374.05, NA, NA, NA, NA, NA, 
            NA, NA, 1375, NA, NA, NA, NA, NA, 1376.35, NA, NA, NA, NA, NA, 
            NA, NA, NA, NA, NA, 1376.25, NA, 1378, 1376.5, NA, NA, NA, 1378, 
            1378, NA, NA, 1378.8, 231.9, 231.85, NA, 231.9, 231.85, 231.9, 
            231.8, 231.9, 232.6, 231.95, 232.35, 232, 232.1, 232.05, 232.05, 
            232.05, 231.5, 231.3, NA, NA, 231.1, 231.1, 231.1, 231, 231, 
            230.95, 230.6, 230.6, 230.7, 230.6, 231, NA, 231, 231, 231.45, 
            231.65, 231.4, 231.7, 231.3, 231.25, 231.25, 231.4, 231.4, 231.85, 
            231.75, 231.5, 231.55, 231.35, NA, 231.5, 231.5, NA, 231.5, 231.25, 
            231.15, 231, 231, 231, 231.05, NA), .indexCLASS = c("POSIXct", 
                                                                "POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "Asia/Calcutta", tzone = "Asia/Calcutta", index = structure(c(1459482299, 
                                                                                                                                                                                     1459482301, 1459482302, 1459482303, 1459482304, 1459482305, 1459482306, 
                                                                                                                                                                                     1459482307, 1459482309, 1459482310, 1459482311, 1459482312, 1459482314, 
                                                                                                                                                                                     1459482315, 1459482316, 1459482317, 1459482318, 1459482319, 1459482320, 
                                                                                                                                                                                     1459482321, 1459482322, 1459482323, 1459482324, 1459482326, 1459482328, 
                                                                                                                                                                                     1459482329, 1459482330, 1459482331, 1459482332, 1459482336, 1459482337, 
                                                                                                                                                                                     1459482338, 1459482339, 1459482342, 1459482344, 1459482346, 1459482347, 
                                                                                                                                                                                     1459482348, 1459482349, 1459482590, 1459482591, 1459482594, 1459482595, 
                                                                                                                                                                                     1459482596, 1459482597, 1459482598, 1459482599, 1459482602, 1459482603, 
                                                                                                                                                                                     1459482604, 1459482609, 1459482610, 1459482611, 1459482612, 1459482613, 
                                                                                                                                                                                     1459482618, 1459482619, 1459482620, 1459482622, 1459482628), tzone = "Asia/Calcutta", tclass = c("POSIXct", 
                                                                                                                                                                                                                                                                                      "POSIXt")), .Dim = c(60L, 2L), .Dimnames = list(NULL,c("A", 
                                                                                                                                                                                                                                                                                                                                              "B")), class = c("xts", "zoo"))
##                          A      B
##2016-04-01 09:14:59 1371.25 231.90
##2016-04-01 09:15:01      NA 231.85
##2016-04-01 09:15:02 1373.95     NA
##2016-04-01 09:15:03      NA 231.90
##2016-04-01 09:15:04      NA 231.85
##2016-04-01 09:15:05 1373.00 231.90
##2016-04-01 09:15:06      NA 231.80
##2016-04-01 09:15:07 1373.95 231.90
##2016-04-01 09:15:09 1373.90 232.60
##2016-04-01 09:15:10      NA 231.95
##2016-04-01 09:15:11      NA 232.35
##2016-04-01 09:15:12 1374.00 232.00
##2016-04-01 09:15:14 1374.15 232.10
##2016-04-01 09:15:15      NA 232.05
##2016-04-01 09:15:16 1374.00 232.05
##2016-04-01 09:15:17 1373.85 232.05
##2016-04-01 09:15:18 1372.55 231.50
##2016-04-01 09:15:19 1374.05 231.30
##2016-04-01 09:15:20 1374.15     NA
##2016-04-01 09:15:21 1374.75     NA
##2016-04-01 09:15:22      NA 231.10
##2016-04-01 09:15:23      NA 231.10
##2016-04-01 09:15:24 1375.90 231.10
##2016-04-01 09:15:26 1374.05 231.00
##2016-04-01 09:15:28      NA 231.00
##2016-04-01 09:15:29      NA 230.95
##2016-04-01 09:15:30      NA 230.60
##2016-04-01 09:15:31      NA 230.60
##2016-04-01 09:15:32      NA 230.70
##2016-04-01 09:15:36      NA 230.60
##2016-04-01 09:15:37      NA 231.00
##2016-04-01 09:15:38 1375.00     NA
##2016-04-01 09:15:39      NA 231.00
##2016-04-01 09:15:42      NA 231.00
##2016-04-01 09:15:44      NA 231.45
##2016-04-01 09:15:46      NA 231.65
##2016-04-01 09:15:47      NA 231.40
##2016-04-01 09:15:48 1376.35 231.70
##2016-04-01 09:15:49      NA 231.30
##2016-04-01 09:19:50      NA 231.25
##2016-04-01 09:19:51      NA 231.25
##2016-04-01 09:19:54      NA 231.40
##2016-04-01 09:19:55      NA 231.40
##2016-04-01 09:19:56      NA 231.85
##2016-04-01 09:19:57      NA 231.75
##2016-04-01 09:19:58      NA 231.50
##2016-04-01 09:19:59      NA 231.55
##2016-04-01 09:20:02      NA 231.35
##2016-04-01 09:20:03 1376.25     NA
##2016-04-01 09:20:04      NA 231.50
##2016-04-01 09:20:09 1378.00 231.50
##2016-04-01 09:20:10 1376.50     NA
##2016-04-01 09:20:11      NA 231.50
##2016-04-01 09:20:12      NA 231.25
##2016-04-01 09:20:13      NA 231.15
##2016-04-01 09:20:18 1378.00 231.00
##2016-04-01 09:20:19 1378.00 231.00
##2016-04-01 09:20:20      NA 231.00
##2016-04-01 09:20:22      NA 231.05
##2016-04-01 09:20:28 1378.80     NA

根据这些数据,我们得到:

print(res)
##                          A      B
##2016-04-01 09:14:59 1371.25 231.90
##2016-04-01 09:19:59      NA 231.55

要获得您发布的输出,您需要首先生成一个时间序列,其中包含您想要的每 5 分钟刻度的数据(设置为 NA)。对于此示例,此时间序列(仅适用于 2016-04-01 上从 09:0009:20 的 5 分钟刻度)可以是:

every.5.min <- structure(c(NA, NA, NA, NA, NA), .Dim = c(5L, 1L), .Dimnames = list(
NULL, "Empty"), index = structure(c(1459481400, 1459481700, 
1459482000, 1459482300, 1459482600), tzone = "Asia/Calcutta", tclass = c("POSIXct", 
"POSIXt")), class = c("xts", "zoo"), .indexCLASS = c("POSIXct", 
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "Asia/Calcutta", tzone = "Asia/Calcutta")
##                    Empty
##2016-04-01 09:00:00    NA
##2016-04-01 09:05:00    NA
##2016-04-01 09:10:00    NA
##2016-04-01 09:15:00    NA
##2016-04-01 09:20:00    NA

然后,merge 这与 tt:

tt <- merge(tt, every.5.min, all=TRUE)[,1:ncol(tt)]

如果 tt 中不存在该行(即每 5 分钟),all=TRUE 将用 NA 填充原始 tt 中的行。请注意合并后,我们只保留原始 tt.

中的列

然后,在 tt 上,用之前的值填充所有 NA

res <- do.call(merge, lapply(tt, na.locf))

最后,使用 .indexmin.indexsec:

仅提取每 5 分钟刻度的那些行
res <- res[.indexmin(res) %% 5 == 0 & .indexsec(res) == 0]
##                          A      B
##2016-04-01 09:00:00      NA     NA
##2016-04-01 09:05:00      NA     NA
##2016-04-01 09:10:00      NA     NA
##2016-04-01 09:15:00 1371.25 231.90
##2016-04-01 09:20:00 1376.35 231.55