使用以 POSIXct 为标准的数据框拆分

Use split on dataframe with POSIXlt as critera

我正在尝试根据时间拆分和汇总一些数据。

这里有一些多余的信息,不应干扰此 post。我想根据 FiveMinBar 拆分数据,然后获取第一个开盘价、最高价、最低价和最后收盘价。和最后一个 FiveMinBar。

            Date  Time  Open  High   Low Close            DateTime          FiveMinBar
10173 2000-01-03 09:31 70.00 70.00 69.88 70.00 2000-01-03 09:31:00 2000-01-03 09:35:00
10174 2000-01-03 09:32 70.00 70.00 69.50 70.00 2000-01-03 09:32:00 2000-01-03 09:35:00
10175 2000-01-03 09:33 69.94 70.00 69.50 70.00 2000-01-03 09:33:00 2000-01-03 09:35:00
10176 2000-01-03 09:34 70.00 70.00 69.38 70.00 2000-01-03 09:34:00 2000-01-03 09:35:00
10177 2000-01-03 09:35 70.00 70.00 69.50 69.81 2000-01-03 09:35:00 2000-01-03 09:35:00
10178 2000-01-03 09:36 69.81 69.88 68.75 68.75 2000-01-03 09:36:00 2000-01-03 09:40:00
10179 2000-01-03 09:37 68.75 69.06 68.75 68.75 2000-01-03 09:37:00 2000-01-03 09:40:00
10180 2000-01-03 09:38 68.81 69.06 68.56 68.63 2000-01-03 09:38:00 2000-01-03 09:40:00
10181 2000-01-03 09:39 68.56 69.00 68.50 68.56 2000-01-03 09:39:00 2000-01-03 09:40:00
10182 2000-01-03 09:40 68.56 69.00 68.13 68.13 2000-01-03 09:40:00 2000-01-03 09:40:00
10183 2000-01-03 09:41 68.63 68.63 67.75 67.88 2000-01-03 09:41:00 2000-01-03 09:45:00
10184 2000-01-03 09:42 68.00 68.06 67.25 67.38 2000-01-03 09:42:00 2000-01-03 09:45:00
10185 2000-01-03 09:43 67.38 67.38 67.00 67.19 2000-01-03 09:43:00 2000-01-03 09:45:00
10186 2000-01-03 09:44 67.13 67.25 66.75 66.81 2000-01-03 09:44:00 2000-01-03 09:45:00
10187 2000-01-03 09:45 66.88 67.25 66.00 66.31 2000-01-03 09:45:00 2000-01-03 09:45:00

我的第一个尝试是使用 sapply 和

来做到这一点

split(data, data$FiveMinBar)

但是,拆分不适用于 POSIXlt 数据。我确实提出了以下解决方案,但它与 "R optimal" 相去甚远,因为它创建了一个空数据框,需要将 FiveMinBar 强制转换为数字,然后转换回 POSIXlt,并使用 for 循环。

我的解决方案:

 results <- data.frame(Open=numeric(), High=numeric(), Low=numeric(), Close=numeric(),
                        DateTime=numeric())

  for (i in 1:length(unique(data$FiveMinBar))){
    temp <- data[data$FiveMinBar == unique(data$FiveMinBar)[i],]
    Open=temp$Open[1] 
    High=max(temp$High) 
    Low=min(temp$Low)
    Close=temp$Close[nrow(temp)]
    DateTime= as.numeric(temp$DateTime[nrow(temp)])
    results <- rbind(results, cbind(Open, High, Low, Close, DateTime))
  }

   results$DateTime <- as.POSIXlt(results$DateTime, origin="1970-01-01")

结果如下:

    Open  High   Low Close            DateTime
1  70.00 70.00 69.38 69.81 2000-01-03 09:35:00
2  69.81 69.88 68.13 68.13 2000-01-03 09:40:00
3  68.63 68.63 66.00 66.31 2000-01-03 09:45:00
4  66.25 66.50 65.63 65.81 2000-01-03 09:50:00
5  65.88 65.88 64.25 64.36 2000-01-03 09:55:00
6  64.31 64.38 63.25 63.44 2000-01-03 10:00:00
7  63.44 64.50 63.25 64.19 2000-01-03 10:05:00
8  64.25 64.63 63.75 64.44 2000-01-03 10:10:00
9  64.63 64.94 64.19 64.81 2000-01-03 10:15:00
10 64.88 65.25 64.56 65.13 2000-01-03 10:20:00

有没有更简洁的方法来做到这一点?我更愿意将数据保留为数据框而不是转换为 xts。

谢谢。

这是重新创建初始数据框的代码:

data <- structure(list(Date = structure(c(10959, 10959, 10959, 10959, 
10959, 10959, 10959, 10959, 10959, 10959, 10959, 10959, 10959, 
10959, 10959), class = "Date"), Time = c("09:31", "09:32", "09:33", 
"09:34", "09:35", "09:36", "09:37", "09:38", "09:39", "09:40", 
"09:41", "09:42", "09:43", "09:44", "09:45"), Open = c(70, 70, 
69.94, 70, 70, 69.81, 68.75, 68.81, 68.56, 68.56, 68.63, 68, 
67.38, 67.13, 66.88), High = c(70, 70, 70, 70, 70, 69.88, 69.06, 
69.06, 69, 69, 68.63, 68.06, 67.38, 67.25, 67.25), Low = c(69.88, 
69.5, 69.5, 69.38, 69.5, 68.75, 68.75, 68.56, 68.5, 68.13, 67.75, 
67.25, 67, 66.75, 66), Close = c(70, 70, 70, 70, 69.81, 68.75, 
68.75, 68.63, 68.56, 68.13, 67.88, 67.38, 67.19, 66.81, 66.31
), DateTime = structure(list(sec = c(0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0), min = 31:45, hour = c(9L, 9L, 9L, 9L, 9L, 
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), mday = c(3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), mon = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), year = c(100L, 
100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 
100L, 100L, 100L), wday = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), yday = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), isdst = c(0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("sec", "min", 
"hour", "mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt", 
"POSIXt")), FiveMinBar = structure(list(sec = c(0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), min = c(35L, 35L, 35L, 35L, 35L, 
40L, 40L, 40L, 40L, 40L, 45L, 45L, 45L, 45L, 45L), hour = c(9L, 
9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), mday = c(3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), mon = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), year = c(100L, 
100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 
100L, 100L, 100L), wday = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), yday = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), isdst = c(0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("sec", "min", 
"hour", "mday", "mon", "year", "wday", "yday", "isdst"), tzone = c("", 
"EST", "EDT"), class = c("POSIXlt", "POSIXt"))), .Names = c("Date", 
"Time", "Open", "High", "Low", "Close", "DateTime", "FiveMinBar"
), row.names = 10173:10187, class = "data.frame")

问题实际上是您在 data.frame 中有一个 POSIXlt 值。这些存储在一个列表中,因为 data.frames 是列表,所以你有一个列表列表,这不是很容易使用。如果要在 data.frame 中存储 date/times 值,最好使用兄弟数据类型 POSIXct。这被存储为一个简单的向量而不是一个列表。

在上面的例子中,你可以用

隐藏列
data$FiveMinBar <- as.POSIXct(data$FiveMinBar)

然后拆分应该没有问题

split(data, data$FiveMinBar)