如何从 R 中的时间戳中计算和提取价格

Question

我有商品价格数据，您在下面的 table 中看到：

Date      Time         Price
19990104  14:11:14.34  220 
19990104  14:11:21.21  200 
19990104  14:11:36.35  221  
19990104  14:11:45.45  202  
19990104  14:11:56.11  215

你看时间是 14 小时 11 分 x 秒，with.xx 毫秒。我试图找到一分钟内给出的第一个值、最后一个值、最高值和最低值。我有数千天和数分钟的数据，以上只是摘录。

因此，我想创建一个包含所有信息的行。对于上面的 table，结果应该是：

Date     Time      Start End  Low High
19990104 14:11:00  220   215  200 221

感谢任何帮助。谢谢！

Answer 1

这个怎么样：

dat <- tibble::tribble(
  ~Date,      ~Time,      ~Price,
19990104,  "14:11:14", 220, 
19990104,  "14:11:21",  200, 
19990104,  "14:11:36", 221,  
19990104,  "14:11:45",  202 , 
19990104,  "14:11:56", 215)  

library(lubridate)
library(dplyr)

dat %>% 
  mutate(hms = hms(Time), 
         hour = hour(hms), 
         minute = minute(hms), 
         Time = hm(paste(hour, minute, sep=":"))) %>% 
  group_by(Date, Time) %>% 
  summarise(Start = first(Price), 
            End = last(Price), 
            Low = min(Price), 
            High = max(Price)) 

# `summarise()` has grouped output by 'Date'. You can override using the `.groups` argument.
# # A tibble: 1 x 6
# # Groups:   Date [1]
#       Date Time       Start   End   Low  High
#      <dbl> <Period>   <dbl> <dbl> <dbl> <dbl>  
# 1 19990104 14H 11M 0S   220   215   200   221

Answer 2

首先，将 Date 和 Time 字段转换为单个 POSIXt-class 对象可能是更好的方法。如果您需要 Date+Time 在某些时候成为类似数字的字段（例如，随着时间的推移绘制一些东西），这将是一个很好的方法。这不是必需的，但根据我的经验，我几乎总是需要用数字来处理时间（通常也需要有日期）。

如果您不want/need更改为POSIXt或Time class，您可以执行以下操作。（我添加了几个数据行以显示多个摘要行。）

基础 R

dat$min <- substr(dat$Time, 1, 5)
aggregate(dat$Price, dat[,c("Date","min")], function(Price) c(Start=Price[1], End=Price[length(Price)], Low=min(Price), High=max(Price)))
#       Date   min x.Start x.End x.Low x.High
# 1 19990104 14:11     220   215   200    221
# 2 19990104 14:12     229   209   209    229

dplyr

library(dplyr)
dat %>%
  arrange(Date, Time) %>%
  group_by(Date, min = substr(dat$Time, 1, 5)) %>%
  summarize(Time = min(Time), Start = first(Price), End = last(Price), Low = min(Price), High = max(Price)) %>%
  ungroup() %>%
  select(-min)
# # A tibble: 2 x 6
#       Date Time     Start   End   Low  High
#      <int> <chr>    <int> <int> <int> <int>
# 1 19990104 14:11:14   220   215   200   221
# 2 19990104 14:12:14   229   209   209   229

数据

dat <- structure(list(Date = c(19990104L, 19990104L, 19990104L, 19990104L, 19990104L, 19990104L, 19990104L), Time = c("14:11:14", "14:11:21", "14:11:36", "14:11:45", "14:11:56", "14:12:14", "14:12:21"),     Price = c(220L, 200L, 221L, 202L, 215L, 229L, 209L)), class = "data.frame", row.names = c(NA, -7L))

Answer 3

首先根据您的示例创建要使用的数据框 (d)
然后，使用正则表达式提取小时、分钟和秒
最后对分组排列好的数据使用summarize()得到你想要的输出

library(tibble)
library(dplyr)

d <- tribble(
  ~Date,      ~Time,      ~Price,
  19990104,  "14:11:14",  220, 
  19990104,  "14:11:21",  200, 
  19990104,  "14:11:36",  221,  
  19990104,  "14:11:45",  202,  
  19990104,  "14:11:56",  215 
)


d %>%
  mutate(hour = gsub("^([0-9]{2}):.*$", "\1", Time),
         minute = gsub("^.*:([0-9]{2}):.*$", "\1", Time),
         seconds = gsub("^.*:.*:([0-9]{2})$", "\1", Time),
         totalseconds = (as.numeric(hour) * 60 * 60) + (as.numeric(minute) * 60) + as.numeric(seconds)) %>%
  group_by(Date, hour, minute) %>%
  arrange(Date, hour, minute, seconds) %>%
  summarize(Start = first(Price),
            End = last(Price),
            Low = min(Price),
            High = max(Price)) %>%
  mutate(Time = paste0(hour, ":", minute, ":00")) %>%
  select(-hour, -minute) %>%
  relocate(Time, .before = Start)

Answer 4

包 dplyr 应该可以在这里帮助你。假设数据首先按日期和时间排序，下面的代码应该能够按照您想要的方式对数据进行分组：

library(dplyr)

data = data.frame(Date = rep("19990104", 5),
                  Time = c("14:11:14", "14:11:21", "14:11:36", "14:11:45", "14:11:56"),
                  Price = c(220, 200, 221, 202, 215),
                  stringsAsFactors = F)


data_processed = data %>%
  dplyr::mutate(Time_min = paste(substr(Time, start = 1, stop = 5), "00", sep = ":"))

data_summary <- data_processed %>%
  dplyr::group_by(Date, Time_min) %>%
  dplyr::summarise(Start = dplyr::first(Price),
                   End = dplyr::last(Price),
                   Low = min(Price, na.rm = T),
                   High = max(Price, na.rm = T))

Answer 5

quantmod 包中的

to.minutes 就是这样做的。假设 DF 在末尾的注释中可重复显示，将其转换为动物园对象，使用 to.minutes 执行所需的计算给出 zm 并向下舍入分钟给出 zm0 .最后，我们使用 fortify.zoo 将其转换为数据框；但是，您可能希望将其保留为 zm0 以简化使用 quantmod 和 zoo 的其他工具。请注意，quantmod 提供了提取函数：Hi、Lo、Op 和 Cl，还提供了绘制 OHLC 系列的函数。

library(quantmod) # also loads zoo 
library(lubridate)

# this requires R 4.1.  Replace \ with the word function if
#   you have an old version of R    
z <- read.zoo(DF, index = 1:2, FUN = \(d, t) ymd_hms(paste(d, t)))

zm <- to.minutes(z)
zm0 <- aggregate(zm, floor_date(time(zm), "min"))

DF2 <- fortify.zoo(zm0); DF2
##                 Index z.Open z.High z.Low z.Close
## 1 1999-01-04 14:11:00    220    221   200     215

Cl(DF2)
## [1] 215

使用的版本

R.version.string
## [1] "R version 4.1.1 Patched (2021-08-10 r80733)"

packageVersion("quantmod")
## [1] ‘0.4.18’

packageVersion("zoo")
## [1] ‘1.8.9’

packageVersion("lubridate")
## [1] ‘1.7.10’

备注

Lines <- "Date      Time      Price
19990104  14:11:14  220 
19990104  14:11:21  200 
19990104  14:11:36  221  
19990104  14:11:45  202  
19990104  14:11:56  215  "

DF <- read.table(text = Lines, header = TRUE)

如何从 R 中的时间戳中计算和提取价格

How to calculate and extract prices from timestamps in R

r

calculated-columns

dataframe

calculation

基础 R

dplyr

使用的版本

备注