Select 使用 R 的日期和时间范围为 5 分钟
Select a range of 5 mins by date and time using R
我有一个格式为
的时间序列数据
Ask Bid Trade Ask_Size Bid_Size Trade_Size
2016-11-01 01:00:03 NA 938.10 NA NA 203 NA
2016-11-01 01:00:04 NA 937.20 NA NA 100 NA
2016-11-01 01:00:04 938.00 NA NA 28 NA NA
2016-11-01 01:00:04 NA 938.10 NA NA 203 NA
2016-11-01 01:00:04 939.00 NA NA 11 NA NA
2016-11-01 01:00:05 NA 938.15 NA NA 19 NA
2016-11-01 01:00:06 NA 937.20 NA NA 100 NA
2016-11-01 01:00:06 938.00 NA NA 28 NA NA
2016-11-01 01:00:06 NA NA 938.10 NA NA 69
2016-11-01 01:00:06 NA NA 938.10 NA NA 831
2016-11-01 01:00:06 NA 938.10 NA NA 134 NA
时间序列数据的结构是
str(df_ts)
An ‘xts’ object on 2016-11-01 01:00:03/2016-11-02 12:59:37 containing:
Data: num [1:35797, 1:6] NA NA 938 NA 939 NA NA 938 NA NA ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:6] "Ask" "Bid" "Trade" "Ask_Size" ...
Indexed by objects of class: [POSIXct,POSIXt] TZ:
xts Attributes:
NULL
如何创建 5 分钟时间序列数据的子集。开始时间和结束时间将由用户定义
样本数据可以在
找到
https://www.dropbox.com/s/m94y6pbhjlkny1l/Sample_HFT.csv?dl=0
请帮忙
您可以使用 lubridate 和 apply 函数。我假设您的时间戳(日期和时间)在第一列中,并且我将该列命名为 "timestamp"。数据框是 df。首先安装 lubridate 包。
结果将存储在不同的数据帧 df2.[=11=]
library(lubridate)
# Round to 5 minutes
df$timestamp <- ceiling_date(as.POSIXct(df$timestamp), unit = "5 minutes")
# Create data frame to store results
df2 <- NULL
df2$timestamp <- levels(factor(df$timestamp))
df2 <- apply(df[,2:ncol(df)], 2, function(x)
{
df2 <<- cbind(df2, aggregate(x ~ df$timestamp, FUN = sum)[2])[[ncol(df)-2]]
})
names(df2) <- names(df)
我有一个格式为
的时间序列数据 Ask Bid Trade Ask_Size Bid_Size Trade_Size
2016-11-01 01:00:03 NA 938.10 NA NA 203 NA
2016-11-01 01:00:04 NA 937.20 NA NA 100 NA
2016-11-01 01:00:04 938.00 NA NA 28 NA NA
2016-11-01 01:00:04 NA 938.10 NA NA 203 NA
2016-11-01 01:00:04 939.00 NA NA 11 NA NA
2016-11-01 01:00:05 NA 938.15 NA NA 19 NA
2016-11-01 01:00:06 NA 937.20 NA NA 100 NA
2016-11-01 01:00:06 938.00 NA NA 28 NA NA
2016-11-01 01:00:06 NA NA 938.10 NA NA 69
2016-11-01 01:00:06 NA NA 938.10 NA NA 831
2016-11-01 01:00:06 NA 938.10 NA NA 134 NA
时间序列数据的结构是
str(df_ts)
An ‘xts’ object on 2016-11-01 01:00:03/2016-11-02 12:59:37 containing:
Data: num [1:35797, 1:6] NA NA 938 NA 939 NA NA 938 NA NA ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:6] "Ask" "Bid" "Trade" "Ask_Size" ...
Indexed by objects of class: [POSIXct,POSIXt] TZ:
xts Attributes:
NULL
如何创建 5 分钟时间序列数据的子集。开始时间和结束时间将由用户定义
样本数据可以在
找到https://www.dropbox.com/s/m94y6pbhjlkny1l/Sample_HFT.csv?dl=0
请帮忙
您可以使用 lubridate 和 apply 函数。我假设您的时间戳(日期和时间)在第一列中,并且我将该列命名为 "timestamp"。数据框是 df。首先安装 lubridate 包。 结果将存储在不同的数据帧 df2.[=11=]
library(lubridate)
# Round to 5 minutes
df$timestamp <- ceiling_date(as.POSIXct(df$timestamp), unit = "5 minutes")
# Create data frame to store results
df2 <- NULL
df2$timestamp <- levels(factor(df$timestamp))
df2 <- apply(df[,2:ncol(df)], 2, function(x)
{
df2 <<- cbind(df2, aggregate(x ~ df$timestamp, FUN = sum)[2])[[ncol(df)-2]]
})
names(df2) <- names(df)