我如何跟踪账户最近 6 个月收到的交易总额?

How can I keep track of total transaction amount received by an account each last 6 month?

这是我的交易数据。它显示了从 from 列中的账户到 to 列中的账户的交易,以及日期和金额信息

data 

id          from    to          date        amount  
<int>       <fctr>  <fctr>      <date>      <dbl>
19521       6644    6934        2005-01-01  700.0
19524       6753    8456        2005-01-01  600.0
19523       9242    9333        2005-01-01  1000.0
…           …       …           …           …
1056317     7819    7454        2010-12-31  60.2
1056318     6164    7497        2010-12-31  107.5
1056319     7533    7492        2010-12-31  164.1

现在我想做的是:我想逐行检查每笔交易,对于 from 列中的每个账户,我想跟踪他们收到的交易金额在进行特定交易时的最后 6 个月,并希望将此信息保存为一个新列。(因此这个新列将描述 from 列中的账户在之前的最后六个月中收到的总交易金额交易日期。)

例如:
在第一行数据中,对于帐户 6644,如果 6644 在日期 "2004-07-05"-"2005-01-01" 之间有 6 个月的交易,我应该查看 to 列期间一直到 2005-01-01 日期,这是我们在 6644 进行交易的日期。如果有 6644 收到的此类交易,我应该将它们汇总并将此总和信息作为值添加到新列 total_trx_amount_received_in_last_6month 中。同样,我应该为帐户 6753 做同样的事情 并查找它在日期 "2004-07-05"-"2005-01-01" 之间获得的交易并将它们相加并将此总和添加到列 total_trx_amount_received_in_last_6month。 而且,我应该在数据中逐行以这种方式继续。

那么,我怎样才能对整个数据实现这一点?

PS:日期区间"2004-07-05"-"2005-01-01"中,"2005-01-01"为交易日期,得到第二个日期"2004-07-05"我减去180天(约6个月) ) 从交易日期 "2005-01-01".

为了更好看,我提供以下数据:
我还将展示输出结果。假设我们只有这么多交易。只需考虑此处的 5370 帐户,因为其他帐户 8605,6390,8934 未在此处接收任何交易。

id          from    to          date        amount  total_trx_amount_received_in_last_6month 
<int>       <fctr>  <fctr>      <date>      <dbl>    <dbl>
18529       5370    9356        2005-05-31  24.4     0.0
13742       5370    5605        2005-08-05  7618.0   0.0
9913        5370    8567        2005-09-12  21971.0  0.0
956         8605    5370        2005-10-05  5245.0   0.0
2557        5370    5636        2005-11-12  2921.0   5245.0    
1602        6390    5370        2005-11-26  8000.0   0.0
18669       5370    8933        2005-11-30  169.2    (5245.0+8000.0)=13245
35900       5370    8483        2006-01-31  71.5     (5245.0+8000.0)=13245
48667       8934    5370        2006-03-31  14.6     0.0
51341       5370    7626        2006-04-11  4214.0   (8000.0+14.6)=8014.6

这是我所做的: 首先注意上面这个小数据是按date升序排列的
在第一行中,对于 from column 中的账户 5370,我查看过去的数据以查看 5370 是否在日期 "2004-12-02"-"2005-05-31" 之间收到任何交易。由于第一行是第一笔交易,显然 "2005-05-31" 之前 5370 没有收到任何交易,所以我将 0.0 分别记录到 total_trx_amount_received_in_last_6month 列。在第二行中,对于 from column 中的账户 53705370 在日期 "2005-02-06"-"2005-08-05" 之间也没有收到任何交易,所以我将 0.0 记录到 total_trx_amount_received_in_last_6month栏目。同样,我在帐户 53708605 的第 3 行和第 4 行分别记录 0.0。在第五行中,对于 from column 中的帐户 53705370 在日期 "2005-05-16"-"2005-11-12" 之间收到了一笔交易,该交易在 "2005-10-05" 中收到(在数据的第 4 行)的数量为 5245.0,因此我将 5245.0 记录到 total_trx_amount_received_in_last_6month 列中。在第六行,对于 from column 中的账户 63906390 在日期 "2005-05-30"-"2005-11-26" 之间没有收到任何交易,所以我将 0.0 记录到 total_trx_amount_received_in_last_6month 列。所有数据行都是这样。

dput()输出数据:

structure(list(id = c(18529L, 13742L, 9913L, 956L, 2557L, 1602L, 
18669L, 35900L, 48667L, 51341L, 53713L, 60126L, 60545L, 65113L, 
66783L, 83324L, 87614L, 88898L, 89874L, 94765L, 100277L, 101587L, 
103444L, 108414L, 113319L, 121516L, 126607L, 130170L, 131771L, 
135002L, 149431L, 157403L, 157645L, 158831L, 162597L, 162680L, 
163901L, 165044L, 167082L, 168562L, 168940L, 172578L, 173031L, 
173267L, 177507L, 179167L, 182612L, 183499L, 188171L, 189625L, 
193940L, 198764L, 199342L, 200134L, 203328L, 203763L, 204733L, 
205651L, 209672L, 210242L, 210979L, 214532L, 214741L, 215738L, 
216709L, 220828L, 222140L, 222905L, 226133L, 226527L, 227160L, 
228193L, 231782L, 232454L, 233774L, 237836L, 237837L, 238860L, 
240223L, 245032L, 246673L, 247561L, 251611L, 251696L, 252663L, 
254410L, 255126L, 255230L, 258484L, 258485L, 259309L, 259910L, 
260542L, 262091L, 264462L, 264887L, 264888L, 266125L, 268574L, 
272959L), from = c("5370", "5370", "5370", "8605", "5370", "6390", 
"5370", "5370", "8934", "5370", "5635", "6046", "5680", "8026", 
"9037", "5370", "7816", "8046", "5492", "8756", "5370", "9254", 
"5370", "5370", "7078", "6615", "5370", "9817", "8228", "8822", 
"5735", "7058", "5370", "8667", "9315", "6053", "7990", "8247", 
"8165", "5656", "9261", "5929", "8251", "5370", "6725", "5370", 
"6004", "7022", "7442", "5370", "8679", "6491", "7078", "5370", 
"5370", "5370", "5658", "5370", "9296", "8386", "5370", "5370", 
"5370", "9535", "5370", "7541", "5370", "9621", "5370", "7158", 
"8240", "5370", "5370", "8025", "5370", "5370", "5370", "6989", 
"5370", "7059", "5370", "5370", "5370", "9121", "5608", "5370", 
"5370", "7551", "5370", "5370", "5370", "5370", "9163", "9362", 
"6072", "5370", "5370", "5370", "5370", "5370"), to = c("9356", 
"5605", "8567", "5370", "5636", "5370", "8933", "8483", "5370", 
"7626", "5370", "5370", "5370", "5370", "5370", "9676", "5370", 
"5370", "5370", "5370", "9105", "5370", "9772", "6979", "5370", 
"5370", "7564", "5370", "5370", "5370", "5370", "5370", "8744", 
"5370", "5370", "5370", "5370", "5370", "5370", "5370", "5370", 
"5370", "5370", "7318", "5370", "8433", "5370", "5370", "5370", 
"7122", "5370", "5370", "5370", "8566", "6728", "9689", "5370", 
"8342", "5370", "5370", "5614", "5596", "5953", "5370", "7336", 
"5370", "7247", "5370", "7291", "5370", "5370", "6282", "7236", 
"5370", "8866", "8613", "9247", "5370", "6767", "5370", "9273", 
"7320", "9533", "5370", "5370", "8930", "9343", "5370", "9499", 
"7693", "7830", "5392", "5370", "5370", "5370", "7497", "8516", 
"9023", "7310", "8939"), date = structure(c(12934, 13000, 13038, 
13061, 13099, 13113, 13117, 13179, 13238, 13249, 13268, 13296, 
13299, 13309, 13314, 13391, 13400, 13404, 13409, 13428, 13452, 
13452, 13460, 13482, 13493, 13518, 13526, 13537, 13542, 13544, 
13596, 13616, 13617, 13626, 13633, 13633, 13639, 13642, 13646, 
13656, 13660, 13664, 13667, 13669, 13677, 13686, 13694, 13694, 
13707, 13716, 13725, 13738, 13739, 13746, 13756, 13756, 13756, 
13761, 13769, 13770, 13776, 13786, 13786, 13786, 13791, 13799, 
13806, 13813, 13817, 13817, 13817, 13822, 13829, 13830, 13836, 
13847, 13847, 13847, 13852, 13860, 13866, 13871, 13878, 13878, 
13878, 13882, 13883, 13883, 13887, 13887, 13888, 13889, 13890, 
13891, 13895, 13896, 13896, 13899, 13905, 13909), class = "Date"), 
    amount = c(24.4, 7618, 21971, 5245, 2921, 8000, 169.2, 71.5, 
    14.6, 4214, 14.6, 13920, 14.6, 24640, 1600, 261.1, 16400, 
    3500, 2700, 19882, 182, 14.6, 16927, 25653, 3059, 2880, 9658, 
    4500, 12480, 14.6, 1000, 3679, 34430, 12600, 14.6, 19.2, 
    4900, 826, 3679, 2100, 38000, 79, 11400, 21495, 3679, 200, 
    14.6, 100.6, 3679, 5300, 108.9, 3679, 2696, 7500, 171.6, 
    14.6, 99.2, 2452, 3679, 3218, 700, 69.7, 14.6, 91.5, 2452, 
    3679, 2900, 17572, 14.6, 14.6, 90.5, 2452, 49752, 3679, 1900, 
    14.6, 870, 85.2, 2452, 3679, 1600, 540, 14.6, 14.6, 79, 210, 
    2452, 28400, 720, 180, 420, 44289, 489, 3679, 840, 2900, 
    150, 870, 420, 14.6)), row.names = c(NA, -100L), class = "data.frame")

(我将 fromto 列转换为字符,因为它们有大量的级别,否则输出将占用这么多 space)

我们可以使用 map2_dbl 并取 amount 中的 sum 位于 6 个月的范围内。

library(dplyr)
library(purrr)

data %>% 
    mutate(amt = map2_dbl(from, date,
                ~sum(amount[to == .x & between(date, .y - 180, .y)])))