我怎样才能为每个账户从他们的第一个交易日期开始分别设置特定的时间段?

How can I make specific periods for each account separately starting from their first transaction date?

这是我的交易数据:

id          from_id        to_id      amount    date_trx
<fctr>      <fctr>         <fctr>     <dbl>     <date>
0           7468           5695       700.0     2005-01-04
1           6213           9379       11832.0   2005-01-08
2           7517           8170       1000.0    2005-01-10
3           6143           9845       4276.0    2005-01-12
4           6254           9640       200.0     2005-01-14
5           6669           5815       200.0     2005-01-20
6           6934           8583       49752.0   2005-01-24
7           9240           8314       19961.0   2005-01-26
8           6374           8865       1000.0    2005-01-30
9           6143           6530       13.4      2005-01-31
...

我想根据时间间隔构建新功能。

让我们看看这个:

id               from_id        to_id      amount       date_trx
<fctr>           <fctr>         <fctr>     <dbl>        <date>
149431           5370           5735       1000.0       2007-03-24
157403           5370           7058       3679.0       2007-04-13
158831           5370           8667       12600.0      2007-04-23
162680           5370           6053       19.2         2007-04-30
167082           5370           8165       3679.0       2007-05-13
168562           5370           5656       2100.0       2007-05-23
172578           5370           5929       79.0         2007-05-31
177507           5370           6725       3679.0       2007-06-01
179167           5370           8433       200.0        2007-06-22
183499           5370           7022       100.6        2007-06-29
...

比方说,我想计算每个帐户的交易金额,例如,以周为单位。
所以,从2007-03-24开始,5370的每周交易金额历史如下:

in the 1st week(2007-03-24 - 2007-03-31):  1000.0          
in the 2nd week(2007-03-31 - 2007-04-07):  0.0          
in the 3rd week(2007-04-07 - 2007-04-14):  3679.0          
in the 4th week(2007-04-14 - 2007-04-21):  0.0          
in the 5th week(2007-04-21 - 2007-04-28):  12600.0          
in the 6th week(2007-04-28 - 2007-05-05):  19.2          
in the 7th week(2007-05-05 - 2007-05-12):  0.0          
in the 8th week(2007-05-12 - 2007-05-19):  3679.0          
in the 9th week(2007-05-19 - 2007-05-26):  2100.0          
in the 10th week(2007-05-26 - 2007-06-02):  79.0 + 3679.0 = 3758.0          
in the 11th week(2007-06-02 - 2007-06-09):  0.0          
in the 12th week(2007-06-09 - 2007-06-16):  0.0          
in the 13th week(2007-06-16 - 2007-06-23):  200.0          
in the 14th week(2007-06-23 - 2007-06-30):  100.6          

在这里我们看到一周内交易的最大金额 537012600.0。所以,现在我想将此度量视为一项功能,比如 max_of_weekly_transacted_amount
同样,我想计算每个帐户在一个月内交易的平均金额,并将其存储为另一个功能,比如 mean_of_monthly_transacted_amount

我试过 lubridate 函数 floor_date:

# Max of weekly transaction amount
data <- data %>% group_by(date_trx_week=floor_date(date_trx, "week"),from_id) %>% mutate(weekly_trx = sum(amount)) %>% 
   group_by(from_id) %>% mutate(max_of_weekly_transacted_amount=max(weekly_trx))%>% 
   select(-c(date_trx_week,weekly_trx))


# Mean of monthly transaction amount
data <- data %>% group_by(date_trx_month=floor_date(date_trx, "month"),from_id) %>% mutate(monthly_trx = sum(amount)) %>% 
   group_by(from_id) %>% mutate(mean_of_monthly_transacted_amount=mean(monthly_trx))%>% 
   select(-c(date_trx_month,monthly_trx))

我数据中的日期变量 date_trx2005-01-01 开头,以 2010-12-31 结尾。 floor_date2005-01-02-2005-01-09 开始一周,然后以 2005-01-09-2005-01-16 继续,依此类推。它以 2005-01-01-2005-02-01 开始月份,并以 2005-02-01-2005-03-01 继续,依此类推。并且此函数对每个帐户使用相同的周期。

但我想根据每个帐户进行的第一个交易日期为每个帐户专门制作期间。因此,from_id = 5370 的第一笔交易日期是 2007-03-24。如果我想为 5370 设置周周期,则可以是 2007-03-24 - 2007-03-312007-03-31 - 2007-04-07,依此类推。如果我想为 5370 设置月期间,它将是 2007-03-24 - 2007-04-242007-04-24 - 2007-05-24,依此类推。
对于另一个帐户,期间会有所不同。那么,我怎样才能做到这一点?我怎样才能为每个账户从他们的第一个交易日期开始分别设置特定的时间段?

以下均为Base-R。首先,自定义周。这使用 mod 7 来区分第一个条目和其余条目。希望有人可以使用这种方法并使您成为更高效的 data.table 解决方案。

df$Week <- unsplit(tapply(df$date_trx,df$from_id, function(x) as.numeric((x-x[1])) %/% 7 ),df$from_id)

然后如果你想按周取 ID 的平均值,你可以

aggregate(amount ~ from_id + Week, df, mean)

   from_id Week  amount
1     5370    0  1000.0
2     6143    0  4276.0
3     6213    0 11832.0
4     6254    0   200.0
5     6374    0  1000.0
6     6669    0   200.0
7     6934    0 49752.0
8     7468    0   700.0
9     7517    0  1000.0
10    9240    0 19961.0
11    5370    2  3679.0
12    6143    2    13.4
13    5370    4 12600.0
14    5370    5    19.2
15    5370    7  3679.0
16    5370    8  2100.0
17    5370    9  1879.0
18    5370   12   200.0
19    5370   13   100.6

当然,您可以将 mean 替换为 max 或任何其他函数。