R:计算 data.table 中行之间的差异

R: Calculate differences between rows in data.table

RProf 透露,我执行的以下操作相当慢:

stockHistory[.(p), stock:=stockHistory[.(p), stock] - (backorderedDemands[.(p-1),backlog] - backorderedDemands[.(p),backlog])]

我想这是因为减法

backorderedDemands[.(p-1),backlog] - backorderedDemands[.(p),backlog]

有什么办法可以加快这个操作吗?

.(p) 对周期 p 的 data.table 进行子集化,.(p-1) 对前一周期进行子集化(参见下面的示例数据)。在这里应用某种 diff() 可能会更快吗?不过,我不知道该怎么做。

示例数据:

backorderedDemands<-CJ(period=1:1000, articleID=letters[1:10], backlog=0)[,backlog:=round(runif(10000)*42,0)]
setkey(backorderedDemands,period, articleID)
stockHistory<-CJ(period=1:1000, articleID=letters[1:10], stock=0)[,stock:=round(runif(10000)*42+66,0)]
setkey(stockHistory,period, articleID)

如果你想计算你的数据的一阶差分,你可以像下面那样做。它很快......我包括了一步一步的计算。

library(data.table)
library(dplyr)

数据

set.seed(1)

backorderedDemands <- 
    CJ(period = 1:1000, 
       articleID = letters[1:10], 
       backlog = 0)[,backlog:= round(runif(10000) * 42, 0)]

stockHistory <- 
    CJ(period = 1:1000, 
       articleID = letters[1:10], 
       stock = 0)[, stock:= round(runif(10000) * 42 + 66, 0)]

解决方案

    merge(stockHistory, backorderedDemands, 
      by = c("period", "articleID")) %>% 
    group_by(articleID) %>%
    mutate(lag_backlog = lag(backlog, 1),
           my_backlog_diff = backlog - lag_backlog,
           my_diff = stock + my_backlog_diff) %>% 
    as.data.frame(.) %>% 
    head(., 20)

   period articleID stock backlog lag_backlog my_backlog_diff my_diff
1       1         a    69      11          NA              NA      NA
2       1         b    94      16          NA              NA      NA
3       1         c    97      24          NA              NA      NA
4       1         d    71      38          NA              NA      NA
5       1         e    68       8          NA              NA      NA
6       1         f    71      38          NA              NA      NA
7       1         g   103      40          NA              NA      NA
8       1         h   101      28          NA              NA      NA
9       1         i   102      26          NA              NA      NA
10      1         j    67       3          NA              NA      NA
11      2         a    71       9          11              -2      69
12      2         b    89       7          16              -9      80
13      2         c    71      29          24               5      76
14      2         d    96      16          38             -22      74
15      2         e    96      32           8              24     120
16      2         f    99      21          38             -17      82
17      2         g    92      30          40             -10      82
18      2         h    87      42          28              14     101
19      2         i    85      16          26             -10      75
20      2         j    67      33           3              30      97

可以先在backorderedDemands中计算一个差异列。

backorderedDemands[, diff := c(NA, -diff(backlog)), by=articleID]

也没有必要使用stockHistory[.(p), stock]。只用stock就够了。

stockHistoryNew[.(p), stock:=stock - backorderedDemands[.(p), diff]]