如何按组生成日期序列
How to generate sequence of dates by group
假设我们有以下 data.table
set.seed(7)
library(data.table)
library(zoo)
dt <- data.table(ID=c('a','a','a','b','b'), Tag=c(1,2,3,1,2), Begin=c('2015-01-01', '2014-05-07', '2014-08-02', '2015-02-03','2013-08-09'), x=rnorm(5), y = rnorm(5), z = rnorm(5))
dt[,Begin:=as.Date(Begin, '%Y-%m-%d')]
return,
ID Tag Begin x y z
1: a 1 2015-01-01 2.2872472 -0.9472799 0.3569862
2: a 2 2014-05-07 -1.1967717 0.7481393 2.7167518
3: a 3 2014-08-02 -0.6942925 -0.1169552 2.2814519
4: b 1 2015-02-03 -0.4122930 0.1526576 0.3240205
5: b 2 2013-08-09 -0.9706733 2.1899781 1.8960671
我将 Begin
列作为日期,并希望将 Begin
延长到接下来的 2 个月。我应用了以下代码:
dt[, Date := seq(from = Begin, to = Begin+months(2), by = '1 months'), by = .(ID, Tag)]
但是我有以下错误:
Warning messages:
1: In `[.data.table`(dt, , `:=`(Date, seq(from = Begin, :
RHS 1 is length 3 (greater than the size (1) of group 1). The last 2 element(s) will be discarded.
2: In `[.data.table`(dt, , `:=`(Date, seq(from = Begin, :
RHS 1 is length 3 (greater than the size (1) of group 2). The last 2 element(s) will be discarded.
3: In `[.data.table`(dt, , `:=`(Date, seq(from = Begin, :
RHS 1 is length 3 (greater than the size (1) of group 3). The last 2 element(s) will be discarded.
4: In `[.data.table`(dt, , `:=`(Date, seq(from = Begin, :
RHS 1 is length 3 (greater than the size (1) of group 4). The last 2 element(s) will be discarded.
5: In `[.data.table`(dt, , `:=`(Date, seq(from = Begin, :
RHS 1 is length 3 (greater than the size (1) of group 5). The last 2 element(s) will be discarded.
我期望的结果是
ID Tag Date x y z
1: a 1 2015-01-01 2.2872472 -0.9472799 0.3569862
2: a 1 2015-02-01 2.2872472 -0.9472799 0.3569862
3: a 1 2015-03-01 2.2872472 -0.9472799 0.3569862
4: a 2 2014-05-07 -1.1967717 0.7481393 2.7167518
5: a 2 2014-06-07 -1.1967717 0.7481393 2.7167518
6: a 2 2014-07-07 -1.1967717 0.7481393 2.7167518
7: a 3 2014-08-02 -0.6942925 -0.1169552 2.2814519
8: a 3 2014-09-02 -0.6942925 -0.1169552 2.2814519
9: a 3 2014-10-02 -0.6942925 -0.1169552 2.2814519
10: b 1 2015-02-03 -0.4122930 0.1526576 0.3240205
11: b 1 2015-03-03 -0.4122930 0.1526576 0.3240205
12: b 1 2015-04-03 -0.4122930 0.1526576 0.3240205
13: b 2 2013-08-09 -0.9706733 2.1899781 1.8960671
14: b 2 2013-09-09 -0.9706733 2.1899781 1.8960671
15: b 2 2013-10-09 -0.9706733 2.1899781 1.8960671
我猜错误发生是因为我可能没有唯一键。
请注意,我的示例数据中只有 x
、y
和 z
,但在我的真实数据集中,我有超过 10 列。
你能给我一些建议吗?
我们按行顺序分组,因为 "ID"、"Tag" 组有重复的元素。
dt[, list(Date = seq(Begin, length.out=3, by = '1 month'), x,y,z), by = 1:nrow(dt)]
或者正如@David Arenburg 提到的那样,我们按 "N" 复制行,然后按 "ID"、"Tag" 分组,仅选择 "Begin"[ 的第一个观察值=12=]
dt[rep(1:.N, each = 3)][, Begin := seq(Begin[1L],
length.out=3, by = '1 month'), by = .(ID, Tag)][]
假设我们有以下 data.table
set.seed(7)
library(data.table)
library(zoo)
dt <- data.table(ID=c('a','a','a','b','b'), Tag=c(1,2,3,1,2), Begin=c('2015-01-01', '2014-05-07', '2014-08-02', '2015-02-03','2013-08-09'), x=rnorm(5), y = rnorm(5), z = rnorm(5))
dt[,Begin:=as.Date(Begin, '%Y-%m-%d')]
return,
ID Tag Begin x y z
1: a 1 2015-01-01 2.2872472 -0.9472799 0.3569862
2: a 2 2014-05-07 -1.1967717 0.7481393 2.7167518
3: a 3 2014-08-02 -0.6942925 -0.1169552 2.2814519
4: b 1 2015-02-03 -0.4122930 0.1526576 0.3240205
5: b 2 2013-08-09 -0.9706733 2.1899781 1.8960671
我将 Begin
列作为日期,并希望将 Begin
延长到接下来的 2 个月。我应用了以下代码:
dt[, Date := seq(from = Begin, to = Begin+months(2), by = '1 months'), by = .(ID, Tag)]
但是我有以下错误:
Warning messages:
1: In `[.data.table`(dt, , `:=`(Date, seq(from = Begin, :
RHS 1 is length 3 (greater than the size (1) of group 1). The last 2 element(s) will be discarded.
2: In `[.data.table`(dt, , `:=`(Date, seq(from = Begin, :
RHS 1 is length 3 (greater than the size (1) of group 2). The last 2 element(s) will be discarded.
3: In `[.data.table`(dt, , `:=`(Date, seq(from = Begin, :
RHS 1 is length 3 (greater than the size (1) of group 3). The last 2 element(s) will be discarded.
4: In `[.data.table`(dt, , `:=`(Date, seq(from = Begin, :
RHS 1 is length 3 (greater than the size (1) of group 4). The last 2 element(s) will be discarded.
5: In `[.data.table`(dt, , `:=`(Date, seq(from = Begin, :
RHS 1 is length 3 (greater than the size (1) of group 5). The last 2 element(s) will be discarded.
我期望的结果是
ID Tag Date x y z
1: a 1 2015-01-01 2.2872472 -0.9472799 0.3569862
2: a 1 2015-02-01 2.2872472 -0.9472799 0.3569862
3: a 1 2015-03-01 2.2872472 -0.9472799 0.3569862
4: a 2 2014-05-07 -1.1967717 0.7481393 2.7167518
5: a 2 2014-06-07 -1.1967717 0.7481393 2.7167518
6: a 2 2014-07-07 -1.1967717 0.7481393 2.7167518
7: a 3 2014-08-02 -0.6942925 -0.1169552 2.2814519
8: a 3 2014-09-02 -0.6942925 -0.1169552 2.2814519
9: a 3 2014-10-02 -0.6942925 -0.1169552 2.2814519
10: b 1 2015-02-03 -0.4122930 0.1526576 0.3240205
11: b 1 2015-03-03 -0.4122930 0.1526576 0.3240205
12: b 1 2015-04-03 -0.4122930 0.1526576 0.3240205
13: b 2 2013-08-09 -0.9706733 2.1899781 1.8960671
14: b 2 2013-09-09 -0.9706733 2.1899781 1.8960671
15: b 2 2013-10-09 -0.9706733 2.1899781 1.8960671
我猜错误发生是因为我可能没有唯一键。
请注意,我的示例数据中只有 x
、y
和 z
,但在我的真实数据集中,我有超过 10 列。
你能给我一些建议吗?
我们按行顺序分组,因为 "ID"、"Tag" 组有重复的元素。
dt[, list(Date = seq(Begin, length.out=3, by = '1 month'), x,y,z), by = 1:nrow(dt)]
或者正如@David Arenburg 提到的那样,我们按 "N" 复制行,然后按 "ID"、"Tag" 分组,仅选择 "Begin"[ 的第一个观察值=12=]
dt[rep(1:.N, each = 3)][, Begin := seq(Begin[1L],
length.out=3, by = '1 month'), by = .(ID, Tag)][]