R:用不完整的周期性日期时间信息填充日期时间序列的所有元素

R: Fill in all elements of sequence of datetime with patchy periodic datetime information

我想我什至不知道 'title' 这个问题到底是什么。 但我认为这是一个很常见的数据操作要求。

我有数据表明双方之间定期交换一定数量的商品。交换每小时进行一次。这是一个示例数据框:

df <- cbind.data.frame(Seller = as.character(c("A","A","A","A","A","A")), 
                       Buyer = c("B","B","B","C","C","C"),
                       DateTimeFrom = c("1/07/2013 0:00","1/07/2013 9:00","1/07/2013 0:00","1/07/2013 6:00","1/07/2013 8:00","2/07/2013 9:00"),
                       DateTimeTo = c("1/07/2013 8:00","1/07/2013 15:00","2/07/2013 8:00","1/07/2013 9:00","1/07/2013 12:00","2/07/2013 16:00"),
                       Qty = c(50,10,20,25,5,5)
                       )

df$DateTimeFrom <- as.POSIXct(df$DateTimeFrom, format = '%d/%m/%Y %H:%M', tz = 'GMT')
df$DateTimeTo <- as.POSIXct(df$DateTimeTo, format = '%d/%m/%Y %H:%M', tz = 'GMT')

> df
  Seller Buyer        DateTimeFrom          DateTimeTo Qty
1      A     B 2013-07-01 00:00:00 2013-07-01 08:00:00  50
2      A     B 2013-07-01 09:00:00 2013-07-01 15:00:00  10
3      A     B 2013-07-01 00:00:00 2013-07-02 08:00:00  20
4      A     C 2013-07-01 06:00:00 2013-07-01 09:00:00  25
5      A     C 2013-07-01 08:00:00 2013-07-01 12:00:00   5
6      A     C 2013-07-02 09:00:00 2013-07-02 16:00:00   5

因此,例如,此数据框的第一行表示卖方 "A" 从 2013 年 1 月 7 日午夜起每小时向买方 "B" 出售 50 件商品直到 2013 年 1 月 7 日早上 8 点。您还可以注意到,同一双方之间的某些交换可能会重叠,但只是协商数量不同。

我需要做的(并且需要你的帮助)是生成一个涵盖这两天时间段内所有时间的序列,该序列将两个卖家在该小时内在所有谈判中交换的总量相加。 这将是生成的数据框。

DateTimeSeq <- data.frame(seq(ISOdate(2013,7,1,0),by = "hour", length.out = 48))
colnames(DateTimeSeq) <- c("DateTime")

#What the Answer should be
DateTimeSeq$QtyAB <- c(70,70,70,70,70,70,70,70,70,30,30,30,30,30,30,30,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,20,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
DateTimeSeq$QtyAC <- c(0,0,0,0,0,0,25,25,30,30,5,5,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,5,5,5,5,5,5,5,0,0,0,0,0,0,0)

> DateTimeSeq
              DateTime QtyAB QtyAC
1  2013-07-01 00:00:00    70     0
2  2013-07-01 01:00:00    70     0
3  2013-07-01 02:00:00    70     0
4  2013-07-01 03:00:00    70     0
5  2013-07-01 04:00:00    70     0
6  2013-07-01 05:00:00    70     0
7  2013-07-01 06:00:00    70    25
8  2013-07-01 07:00:00    70    25
9  2013-07-01 08:00:00    70    30
10 2013-07-01 09:00:00    30    30
11 2013-07-01 10:00:00    30     5
12 2013-07-01 11:00:00    30     5
13 2013-07-01 12:00:00    30     5
14 2013-07-01 13:00:00    30     0
15 2013-07-01 14:00:00    30     0
.... etc

有人能伸出援手吗?

谢谢, A

这是我的解决方案,它使用 dplyrreshape 包。

library(dplyr)
library(reshape)

首先,我们应该扩展数据框,使所有内容都采用小时格式。这可以使用 dplyr.

do 部分来完成
df %>% rowwise() %>% 
  do(data.frame(Seller=.$Seller, 
                Buyer=.$Buyer,
                Qty=.$Qty,
                DateTimeCurr=seq(from=.$DateTimeFrom, to=.$DateTimeTo, by="hour")))

输出:

Source: local data frame [66 x 4]
Groups: <by row>

   Seller Buyer Qty        DateTimeCurr
1       A     B  50 2013-07-01 00:00:00
2       A     B  50 2013-07-01 01:00:00
3       A     B  50 2013-07-01 02:00:00
...    

从那里获取正确的 ID 并使用 group_by 函数汇总总数是微不足道的。

df1 <- df %>% rowwise() %>% 
  do(data.frame(Seller=.$Seller, 
                Buyer=.$Buyer,
                Qty=.$Qty,
                DateTimeCurr=seq(from=.$DateTimeFrom, to=.$DateTimeTo, by="hour"))) %>%
  group_by(Seller, Buyer, DateTimeCurr) %>%
  summarise(TotalQty=sum(Qty)) %>% 
  mutate(id=paste0("Qty", Seller, Buyer))

输出:

Source: local data frame [48 x 5]
Groups: Seller, Buyer

   Seller Buyer        DateTimeCurr TotalQty    id
1       A     B 2013-07-01 00:00:00       70 QtyAB
2       A     B 2013-07-01 01:00:00       70 QtyAB
3       A     B 2013-07-01 02:00:00       70 QtyAB

从这个数据框中,我们所要做的就是将其转换为您上面的格式。

> cast(df1,  DateTimeCurr~ id, value="TotalQty")
          DateTimeCurr QtyAB QtyAC
1  2013-07-01 00:00:00    70    NA
2  2013-07-01 01:00:00    70    NA
3  2013-07-01 02:00:00    70    NA
4  2013-07-01 03:00:00    70    NA
5  2013-07-01 04:00:00    70    NA
6  2013-07-01 05:00:00    70    NA

所以整段代码

df1 <- df %>% rowwise() %>% 
  do(data.frame(Seller=.$Seller, 
                Buyer=.$Buyer,
                Qty=.$Qty,
                DateTimeCurr=seq(from=.$DateTimeFrom, to=.$DateTimeTo, by="hour"))) %>%
  group_by(Seller, Buyer, DateTimeCurr) %>%
  summarise(TotalQty=sum(Qty)) %>% 
  mutate(id=paste0("Qty", Seller, Buyer))

cast(df1,  DateTimeCurr~ id, value="TotalQty")