R:计算 data.table 中行之间的差异
R: Calculate differences between rows in data.table
RProf 透露,我执行的以下操作相当慢:
stockHistory[.(p), stock:=stockHistory[.(p), stock] - (backorderedDemands[.(p-1),backlog] - backorderedDemands[.(p),backlog])]
我想这是因为减法
backorderedDemands[.(p-1),backlog] - backorderedDemands[.(p),backlog]
有什么办法可以加快这个操作吗?
.(p) 对周期 p 的 data.table 进行子集化,.(p-1) 对前一周期进行子集化(参见下面的示例数据)。在这里应用某种 diff() 可能会更快吗?不过,我不知道该怎么做。
示例数据:
backorderedDemands<-CJ(period=1:1000, articleID=letters[1:10], backlog=0)[,backlog:=round(runif(10000)*42,0)]
setkey(backorderedDemands,period, articleID)
stockHistory<-CJ(period=1:1000, articleID=letters[1:10], stock=0)[,stock:=round(runif(10000)*42+66,0)]
setkey(stockHistory,period, articleID)
如果你想计算你的数据的一阶差分,你可以像下面那样做。它很快......我包括了一步一步的计算。
library(data.table)
library(dplyr)
数据
set.seed(1)
backorderedDemands <-
CJ(period = 1:1000,
articleID = letters[1:10],
backlog = 0)[,backlog:= round(runif(10000) * 42, 0)]
stockHistory <-
CJ(period = 1:1000,
articleID = letters[1:10],
stock = 0)[, stock:= round(runif(10000) * 42 + 66, 0)]
解决方案
merge(stockHistory, backorderedDemands,
by = c("period", "articleID")) %>%
group_by(articleID) %>%
mutate(lag_backlog = lag(backlog, 1),
my_backlog_diff = backlog - lag_backlog,
my_diff = stock + my_backlog_diff) %>%
as.data.frame(.) %>%
head(., 20)
period articleID stock backlog lag_backlog my_backlog_diff my_diff
1 1 a 69 11 NA NA NA
2 1 b 94 16 NA NA NA
3 1 c 97 24 NA NA NA
4 1 d 71 38 NA NA NA
5 1 e 68 8 NA NA NA
6 1 f 71 38 NA NA NA
7 1 g 103 40 NA NA NA
8 1 h 101 28 NA NA NA
9 1 i 102 26 NA NA NA
10 1 j 67 3 NA NA NA
11 2 a 71 9 11 -2 69
12 2 b 89 7 16 -9 80
13 2 c 71 29 24 5 76
14 2 d 96 16 38 -22 74
15 2 e 96 32 8 24 120
16 2 f 99 21 38 -17 82
17 2 g 92 30 40 -10 82
18 2 h 87 42 28 14 101
19 2 i 85 16 26 -10 75
20 2 j 67 33 3 30 97
可以先在backorderedDemands
中计算一个差异列。
backorderedDemands[, diff := c(NA, -diff(backlog)), by=articleID]
也没有必要使用stockHistory[.(p), stock]
。只用stock
就够了。
stockHistoryNew[.(p), stock:=stock - backorderedDemands[.(p), diff]]
RProf 透露,我执行的以下操作相当慢:
stockHistory[.(p), stock:=stockHistory[.(p), stock] - (backorderedDemands[.(p-1),backlog] - backorderedDemands[.(p),backlog])]
我想这是因为减法
backorderedDemands[.(p-1),backlog] - backorderedDemands[.(p),backlog]
有什么办法可以加快这个操作吗?
.(p) 对周期 p 的 data.table 进行子集化,.(p-1) 对前一周期进行子集化(参见下面的示例数据)。在这里应用某种 diff() 可能会更快吗?不过,我不知道该怎么做。
示例数据:
backorderedDemands<-CJ(period=1:1000, articleID=letters[1:10], backlog=0)[,backlog:=round(runif(10000)*42,0)]
setkey(backorderedDemands,period, articleID)
stockHistory<-CJ(period=1:1000, articleID=letters[1:10], stock=0)[,stock:=round(runif(10000)*42+66,0)]
setkey(stockHistory,period, articleID)
如果你想计算你的数据的一阶差分,你可以像下面那样做。它很快......我包括了一步一步的计算。
library(data.table)
library(dplyr)
数据
set.seed(1)
backorderedDemands <-
CJ(period = 1:1000,
articleID = letters[1:10],
backlog = 0)[,backlog:= round(runif(10000) * 42, 0)]
stockHistory <-
CJ(period = 1:1000,
articleID = letters[1:10],
stock = 0)[, stock:= round(runif(10000) * 42 + 66, 0)]
解决方案
merge(stockHistory, backorderedDemands,
by = c("period", "articleID")) %>%
group_by(articleID) %>%
mutate(lag_backlog = lag(backlog, 1),
my_backlog_diff = backlog - lag_backlog,
my_diff = stock + my_backlog_diff) %>%
as.data.frame(.) %>%
head(., 20)
period articleID stock backlog lag_backlog my_backlog_diff my_diff
1 1 a 69 11 NA NA NA
2 1 b 94 16 NA NA NA
3 1 c 97 24 NA NA NA
4 1 d 71 38 NA NA NA
5 1 e 68 8 NA NA NA
6 1 f 71 38 NA NA NA
7 1 g 103 40 NA NA NA
8 1 h 101 28 NA NA NA
9 1 i 102 26 NA NA NA
10 1 j 67 3 NA NA NA
11 2 a 71 9 11 -2 69
12 2 b 89 7 16 -9 80
13 2 c 71 29 24 5 76
14 2 d 96 16 38 -22 74
15 2 e 96 32 8 24 120
16 2 f 99 21 38 -17 82
17 2 g 92 30 40 -10 82
18 2 h 87 42 28 14 101
19 2 i 85 16 26 -10 75
20 2 j 67 33 3 30 97
可以先在backorderedDemands
中计算一个差异列。
backorderedDemands[, diff := c(NA, -diff(backlog)), by=articleID]
也没有必要使用stockHistory[.(p), stock]
。只用stock
就够了。
stockHistoryNew[.(p), stock:=stock - backorderedDemands[.(p), diff]]