我如何跟踪账户最近 6 个月收到的交易总额?
How can I keep track of total transaction amount received by an account each last 6 month?
这是我的交易数据。它显示了从 from
列中的账户到 to
列中的账户的交易,以及日期和金额信息
data
id from to date amount
<int> <fctr> <fctr> <date> <dbl>
19521 6644 6934 2005-01-01 700.0
19524 6753 8456 2005-01-01 600.0
19523 9242 9333 2005-01-01 1000.0
… … … … …
1056317 7819 7454 2010-12-31 60.2
1056318 6164 7497 2010-12-31 107.5
1056319 7533 7492 2010-12-31 164.1
现在我想做的是:我想逐行检查每笔交易,对于 from
列中的每个账户,我想跟踪他们收到的交易金额在进行特定交易时的最后 6 个月,并希望将此信息保存为一个新列。(因此这个新列将描述 from
列中的账户在之前的最后六个月中收到的总交易金额交易日期。)
例如:
在第一行数据中,对于帐户 6644
,如果 6644
在日期 "2004-07-05"-"2005-01-01"
之间有 6 个月的交易,我应该查看 to
列期间一直到 2005-01-01
日期,这是我们在 6644
进行交易的日期。如果有 6644
收到的此类交易,我应该将它们汇总并将此总和信息作为值添加到新列 total_trx_amount_received_in_last_6month
中。同样,我应该为帐户 6753
做同样的事情
并查找它在日期 "2004-07-05"-"2005-01-01"
之间获得的交易并将它们相加并将此总和添加到列 total_trx_amount_received_in_last_6month
。
而且,我应该在数据中逐行以这种方式继续。
那么,我怎样才能对整个数据实现这一点?
PS:日期区间"2004-07-05"-"2005-01-01"
中,"2005-01-01"
为交易日期,得到第二个日期"2004-07-05"
我减去180天(约6个月) ) 从交易日期 "2005-01-01"
.
为了更好看,我提供以下数据:
我还将展示输出结果。假设我们只有这么多交易。只需考虑此处的 5370
帐户,因为其他帐户 8605,6390,8934
未在此处接收任何交易。
id from to date amount total_trx_amount_received_in_last_6month
<int> <fctr> <fctr> <date> <dbl> <dbl>
18529 5370 9356 2005-05-31 24.4 0.0
13742 5370 5605 2005-08-05 7618.0 0.0
9913 5370 8567 2005-09-12 21971.0 0.0
956 8605 5370 2005-10-05 5245.0 0.0
2557 5370 5636 2005-11-12 2921.0 5245.0
1602 6390 5370 2005-11-26 8000.0 0.0
18669 5370 8933 2005-11-30 169.2 (5245.0+8000.0)=13245
35900 5370 8483 2006-01-31 71.5 (5245.0+8000.0)=13245
48667 8934 5370 2006-03-31 14.6 0.0
51341 5370 7626 2006-04-11 4214.0 (8000.0+14.6)=8014.6
这是我所做的:
首先注意上面这个小数据是按date
升序排列的
在第一行中,对于 from column
中的账户 5370
,我查看过去的数据以查看 5370
是否在日期 "2004-12-02"-"2005-05-31"
之间收到任何交易。由于第一行是第一笔交易,显然 "2005-05-31"
之前 5370
没有收到任何交易,所以我将 0.0
分别记录到 total_trx_amount_received_in_last_6month
列。在第二行中,对于 from column
中的账户 5370
,5370
在日期 "2005-02-06"-"2005-08-05"
之间也没有收到任何交易,所以我将 0.0
记录到 total_trx_amount_received_in_last_6month
栏目。同样,我在帐户 5370
和 8605
的第 3 行和第 4 行分别记录 0.0
。在第五行中,对于 from column
中的帐户 5370
,5370
在日期 "2005-05-16"-"2005-11-12"
之间收到了一笔交易,该交易在 "2005-10-05"
中收到(在数据的第 4 行)的数量为 5245.0
,因此我将 5245.0
记录到 total_trx_amount_received_in_last_6month
列中。在第六行,对于 from column
中的账户 6390
,6390
在日期 "2005-05-30"-"2005-11-26"
之间没有收到任何交易,所以我将 0.0
记录到 total_trx_amount_received_in_last_6month
列。所有数据行都是这样。
dput()输出数据:
structure(list(id = c(18529L, 13742L, 9913L, 956L, 2557L, 1602L,
18669L, 35900L, 48667L, 51341L, 53713L, 60126L, 60545L, 65113L,
66783L, 83324L, 87614L, 88898L, 89874L, 94765L, 100277L, 101587L,
103444L, 108414L, 113319L, 121516L, 126607L, 130170L, 131771L,
135002L, 149431L, 157403L, 157645L, 158831L, 162597L, 162680L,
163901L, 165044L, 167082L, 168562L, 168940L, 172578L, 173031L,
173267L, 177507L, 179167L, 182612L, 183499L, 188171L, 189625L,
193940L, 198764L, 199342L, 200134L, 203328L, 203763L, 204733L,
205651L, 209672L, 210242L, 210979L, 214532L, 214741L, 215738L,
216709L, 220828L, 222140L, 222905L, 226133L, 226527L, 227160L,
228193L, 231782L, 232454L, 233774L, 237836L, 237837L, 238860L,
240223L, 245032L, 246673L, 247561L, 251611L, 251696L, 252663L,
254410L, 255126L, 255230L, 258484L, 258485L, 259309L, 259910L,
260542L, 262091L, 264462L, 264887L, 264888L, 266125L, 268574L,
272959L), from = c("5370", "5370", "5370", "8605", "5370", "6390",
"5370", "5370", "8934", "5370", "5635", "6046", "5680", "8026",
"9037", "5370", "7816", "8046", "5492", "8756", "5370", "9254",
"5370", "5370", "7078", "6615", "5370", "9817", "8228", "8822",
"5735", "7058", "5370", "8667", "9315", "6053", "7990", "8247",
"8165", "5656", "9261", "5929", "8251", "5370", "6725", "5370",
"6004", "7022", "7442", "5370", "8679", "6491", "7078", "5370",
"5370", "5370", "5658", "5370", "9296", "8386", "5370", "5370",
"5370", "9535", "5370", "7541", "5370", "9621", "5370", "7158",
"8240", "5370", "5370", "8025", "5370", "5370", "5370", "6989",
"5370", "7059", "5370", "5370", "5370", "9121", "5608", "5370",
"5370", "7551", "5370", "5370", "5370", "5370", "9163", "9362",
"6072", "5370", "5370", "5370", "5370", "5370"), to = c("9356",
"5605", "8567", "5370", "5636", "5370", "8933", "8483", "5370",
"7626", "5370", "5370", "5370", "5370", "5370", "9676", "5370",
"5370", "5370", "5370", "9105", "5370", "9772", "6979", "5370",
"5370", "7564", "5370", "5370", "5370", "5370", "5370", "8744",
"5370", "5370", "5370", "5370", "5370", "5370", "5370", "5370",
"5370", "5370", "7318", "5370", "8433", "5370", "5370", "5370",
"7122", "5370", "5370", "5370", "8566", "6728", "9689", "5370",
"8342", "5370", "5370", "5614", "5596", "5953", "5370", "7336",
"5370", "7247", "5370", "7291", "5370", "5370", "6282", "7236",
"5370", "8866", "8613", "9247", "5370", "6767", "5370", "9273",
"7320", "9533", "5370", "5370", "8930", "9343", "5370", "9499",
"7693", "7830", "5392", "5370", "5370", "5370", "7497", "8516",
"9023", "7310", "8939"), date = structure(c(12934, 13000, 13038,
13061, 13099, 13113, 13117, 13179, 13238, 13249, 13268, 13296,
13299, 13309, 13314, 13391, 13400, 13404, 13409, 13428, 13452,
13452, 13460, 13482, 13493, 13518, 13526, 13537, 13542, 13544,
13596, 13616, 13617, 13626, 13633, 13633, 13639, 13642, 13646,
13656, 13660, 13664, 13667, 13669, 13677, 13686, 13694, 13694,
13707, 13716, 13725, 13738, 13739, 13746, 13756, 13756, 13756,
13761, 13769, 13770, 13776, 13786, 13786, 13786, 13791, 13799,
13806, 13813, 13817, 13817, 13817, 13822, 13829, 13830, 13836,
13847, 13847, 13847, 13852, 13860, 13866, 13871, 13878, 13878,
13878, 13882, 13883, 13883, 13887, 13887, 13888, 13889, 13890,
13891, 13895, 13896, 13896, 13899, 13905, 13909), class = "Date"),
amount = c(24.4, 7618, 21971, 5245, 2921, 8000, 169.2, 71.5,
14.6, 4214, 14.6, 13920, 14.6, 24640, 1600, 261.1, 16400,
3500, 2700, 19882, 182, 14.6, 16927, 25653, 3059, 2880, 9658,
4500, 12480, 14.6, 1000, 3679, 34430, 12600, 14.6, 19.2,
4900, 826, 3679, 2100, 38000, 79, 11400, 21495, 3679, 200,
14.6, 100.6, 3679, 5300, 108.9, 3679, 2696, 7500, 171.6,
14.6, 99.2, 2452, 3679, 3218, 700, 69.7, 14.6, 91.5, 2452,
3679, 2900, 17572, 14.6, 14.6, 90.5, 2452, 49752, 3679, 1900,
14.6, 870, 85.2, 2452, 3679, 1600, 540, 14.6, 14.6, 79, 210,
2452, 28400, 720, 180, 420, 44289, 489, 3679, 840, 2900,
150, 870, 420, 14.6)), row.names = c(NA, -100L), class = "data.frame")
(我将 from
和 to
列转换为字符,因为它们有大量的级别,否则输出将占用这么多 space)
我们可以使用 map2_dbl
并取 amount
中的 sum
位于 6 个月的范围内。
library(dplyr)
library(purrr)
data %>%
mutate(amt = map2_dbl(from, date,
~sum(amount[to == .x & between(date, .y - 180, .y)])))
这是我的交易数据。它显示了从 from
列中的账户到 to
列中的账户的交易,以及日期和金额信息
data
id from to date amount
<int> <fctr> <fctr> <date> <dbl>
19521 6644 6934 2005-01-01 700.0
19524 6753 8456 2005-01-01 600.0
19523 9242 9333 2005-01-01 1000.0
… … … … …
1056317 7819 7454 2010-12-31 60.2
1056318 6164 7497 2010-12-31 107.5
1056319 7533 7492 2010-12-31 164.1
现在我想做的是:我想逐行检查每笔交易,对于 from
列中的每个账户,我想跟踪他们收到的交易金额在进行特定交易时的最后 6 个月,并希望将此信息保存为一个新列。(因此这个新列将描述 from
列中的账户在之前的最后六个月中收到的总交易金额交易日期。)
例如:
在第一行数据中,对于帐户 6644
,如果 6644
在日期 "2004-07-05"-"2005-01-01"
之间有 6 个月的交易,我应该查看 to
列期间一直到 2005-01-01
日期,这是我们在 6644
进行交易的日期。如果有 6644
收到的此类交易,我应该将它们汇总并将此总和信息作为值添加到新列 total_trx_amount_received_in_last_6month
中。同样,我应该为帐户 6753
做同样的事情
并查找它在日期 "2004-07-05"-"2005-01-01"
之间获得的交易并将它们相加并将此总和添加到列 total_trx_amount_received_in_last_6month
。
而且,我应该在数据中逐行以这种方式继续。
那么,我怎样才能对整个数据实现这一点?
PS:日期区间"2004-07-05"-"2005-01-01"
中,"2005-01-01"
为交易日期,得到第二个日期"2004-07-05"
我减去180天(约6个月) ) 从交易日期 "2005-01-01"
.
为了更好看,我提供以下数据:
我还将展示输出结果。假设我们只有这么多交易。只需考虑此处的 5370
帐户,因为其他帐户 8605,6390,8934
未在此处接收任何交易。
id from to date amount total_trx_amount_received_in_last_6month
<int> <fctr> <fctr> <date> <dbl> <dbl>
18529 5370 9356 2005-05-31 24.4 0.0
13742 5370 5605 2005-08-05 7618.0 0.0
9913 5370 8567 2005-09-12 21971.0 0.0
956 8605 5370 2005-10-05 5245.0 0.0
2557 5370 5636 2005-11-12 2921.0 5245.0
1602 6390 5370 2005-11-26 8000.0 0.0
18669 5370 8933 2005-11-30 169.2 (5245.0+8000.0)=13245
35900 5370 8483 2006-01-31 71.5 (5245.0+8000.0)=13245
48667 8934 5370 2006-03-31 14.6 0.0
51341 5370 7626 2006-04-11 4214.0 (8000.0+14.6)=8014.6
这是我所做的:
首先注意上面这个小数据是按date
升序排列的
在第一行中,对于 from column
中的账户 5370
,我查看过去的数据以查看 5370
是否在日期 "2004-12-02"-"2005-05-31"
之间收到任何交易。由于第一行是第一笔交易,显然 "2005-05-31"
之前 5370
没有收到任何交易,所以我将 0.0
分别记录到 total_trx_amount_received_in_last_6month
列。在第二行中,对于 from column
中的账户 5370
,5370
在日期 "2005-02-06"-"2005-08-05"
之间也没有收到任何交易,所以我将 0.0
记录到 total_trx_amount_received_in_last_6month
栏目。同样,我在帐户 5370
和 8605
的第 3 行和第 4 行分别记录 0.0
。在第五行中,对于 from column
中的帐户 5370
,5370
在日期 "2005-05-16"-"2005-11-12"
之间收到了一笔交易,该交易在 "2005-10-05"
中收到(在数据的第 4 行)的数量为 5245.0
,因此我将 5245.0
记录到 total_trx_amount_received_in_last_6month
列中。在第六行,对于 from column
中的账户 6390
,6390
在日期 "2005-05-30"-"2005-11-26"
之间没有收到任何交易,所以我将 0.0
记录到 total_trx_amount_received_in_last_6month
列。所有数据行都是这样。
dput()输出数据:
structure(list(id = c(18529L, 13742L, 9913L, 956L, 2557L, 1602L,
18669L, 35900L, 48667L, 51341L, 53713L, 60126L, 60545L, 65113L,
66783L, 83324L, 87614L, 88898L, 89874L, 94765L, 100277L, 101587L,
103444L, 108414L, 113319L, 121516L, 126607L, 130170L, 131771L,
135002L, 149431L, 157403L, 157645L, 158831L, 162597L, 162680L,
163901L, 165044L, 167082L, 168562L, 168940L, 172578L, 173031L,
173267L, 177507L, 179167L, 182612L, 183499L, 188171L, 189625L,
193940L, 198764L, 199342L, 200134L, 203328L, 203763L, 204733L,
205651L, 209672L, 210242L, 210979L, 214532L, 214741L, 215738L,
216709L, 220828L, 222140L, 222905L, 226133L, 226527L, 227160L,
228193L, 231782L, 232454L, 233774L, 237836L, 237837L, 238860L,
240223L, 245032L, 246673L, 247561L, 251611L, 251696L, 252663L,
254410L, 255126L, 255230L, 258484L, 258485L, 259309L, 259910L,
260542L, 262091L, 264462L, 264887L, 264888L, 266125L, 268574L,
272959L), from = c("5370", "5370", "5370", "8605", "5370", "6390",
"5370", "5370", "8934", "5370", "5635", "6046", "5680", "8026",
"9037", "5370", "7816", "8046", "5492", "8756", "5370", "9254",
"5370", "5370", "7078", "6615", "5370", "9817", "8228", "8822",
"5735", "7058", "5370", "8667", "9315", "6053", "7990", "8247",
"8165", "5656", "9261", "5929", "8251", "5370", "6725", "5370",
"6004", "7022", "7442", "5370", "8679", "6491", "7078", "5370",
"5370", "5370", "5658", "5370", "9296", "8386", "5370", "5370",
"5370", "9535", "5370", "7541", "5370", "9621", "5370", "7158",
"8240", "5370", "5370", "8025", "5370", "5370", "5370", "6989",
"5370", "7059", "5370", "5370", "5370", "9121", "5608", "5370",
"5370", "7551", "5370", "5370", "5370", "5370", "9163", "9362",
"6072", "5370", "5370", "5370", "5370", "5370"), to = c("9356",
"5605", "8567", "5370", "5636", "5370", "8933", "8483", "5370",
"7626", "5370", "5370", "5370", "5370", "5370", "9676", "5370",
"5370", "5370", "5370", "9105", "5370", "9772", "6979", "5370",
"5370", "7564", "5370", "5370", "5370", "5370", "5370", "8744",
"5370", "5370", "5370", "5370", "5370", "5370", "5370", "5370",
"5370", "5370", "7318", "5370", "8433", "5370", "5370", "5370",
"7122", "5370", "5370", "5370", "8566", "6728", "9689", "5370",
"8342", "5370", "5370", "5614", "5596", "5953", "5370", "7336",
"5370", "7247", "5370", "7291", "5370", "5370", "6282", "7236",
"5370", "8866", "8613", "9247", "5370", "6767", "5370", "9273",
"7320", "9533", "5370", "5370", "8930", "9343", "5370", "9499",
"7693", "7830", "5392", "5370", "5370", "5370", "7497", "8516",
"9023", "7310", "8939"), date = structure(c(12934, 13000, 13038,
13061, 13099, 13113, 13117, 13179, 13238, 13249, 13268, 13296,
13299, 13309, 13314, 13391, 13400, 13404, 13409, 13428, 13452,
13452, 13460, 13482, 13493, 13518, 13526, 13537, 13542, 13544,
13596, 13616, 13617, 13626, 13633, 13633, 13639, 13642, 13646,
13656, 13660, 13664, 13667, 13669, 13677, 13686, 13694, 13694,
13707, 13716, 13725, 13738, 13739, 13746, 13756, 13756, 13756,
13761, 13769, 13770, 13776, 13786, 13786, 13786, 13791, 13799,
13806, 13813, 13817, 13817, 13817, 13822, 13829, 13830, 13836,
13847, 13847, 13847, 13852, 13860, 13866, 13871, 13878, 13878,
13878, 13882, 13883, 13883, 13887, 13887, 13888, 13889, 13890,
13891, 13895, 13896, 13896, 13899, 13905, 13909), class = "Date"),
amount = c(24.4, 7618, 21971, 5245, 2921, 8000, 169.2, 71.5,
14.6, 4214, 14.6, 13920, 14.6, 24640, 1600, 261.1, 16400,
3500, 2700, 19882, 182, 14.6, 16927, 25653, 3059, 2880, 9658,
4500, 12480, 14.6, 1000, 3679, 34430, 12600, 14.6, 19.2,
4900, 826, 3679, 2100, 38000, 79, 11400, 21495, 3679, 200,
14.6, 100.6, 3679, 5300, 108.9, 3679, 2696, 7500, 171.6,
14.6, 99.2, 2452, 3679, 3218, 700, 69.7, 14.6, 91.5, 2452,
3679, 2900, 17572, 14.6, 14.6, 90.5, 2452, 49752, 3679, 1900,
14.6, 870, 85.2, 2452, 3679, 1600, 540, 14.6, 14.6, 79, 210,
2452, 28400, 720, 180, 420, 44289, 489, 3679, 840, 2900,
150, 870, 420, 14.6)), row.names = c(NA, -100L), class = "data.frame")
(我将 from
和 to
列转换为字符,因为它们有大量的级别,否则输出将占用这么多 space)
我们可以使用 map2_dbl
并取 amount
中的 sum
位于 6 个月的范围内。
library(dplyr)
library(purrr)
data %>%
mutate(amt = map2_dbl(from, date,
~sum(amount[to == .x & between(date, .y - 180, .y)])))