如何获取一列合约支付对列表(日期:金额)?

How to get a column of lists of contract payout pairs (date:amounts)?

我正在尝试将一列列表对象添加到 data.frame 的付款方式中

ID <- c("A", "B", "B", "c", "A", "B", "c", "c", "A", "B")
Date = seq(as.Date("2000/07/01"), as.Date("2000/07/10"), "days")
Amt <- rnorm(10, 10, 3)

E <- data.frame(Date = Date, ID = ID, Amt = Amt)

         Date ID       Amt
1  2000-07-01  A  6.663256
2  2000-07-02  B 17.084491
3  2000-07-03  B  8.644242
4  2000-07-04  c  4.729045
5  2000-07-05  A  7.345490
6  2000-07-06  B  4.678909
7  2000-07-07  c  8.907506
8  2000-07-08  c  6.194540
9  2000-07-09  A  7.864848
10 2000-07-10  B 11.269177

首先,我使用 dplyr 构建了几列所需的摘要:

E.e <- E %>%
  group_by(ID) %>% 
  summarise(contract_len = as.numeric(difftime(last(Date), first(Date), unit="days")),
            first_pay = first(Date),
            last_pay = last(Date),
            num_payments = n(),
            payment = sum(Amt)) 

E.e
Source: local data frame [3 x 6]

  ID contract_len  first_pay   last_pay num_payments  payment
1  A            8 2000-07-01 2000-07-09            3 21.87359
2  B            8 2000-07-02 2000-07-10            4 41.67682
3  c            4 2000-07-04 2000-07-08            3 19.83109

现在,我尝试添加一列键值对列表,其中键是给定 ID 付款的日期,值是描述该日期付款的数字对象。

我已经尝试了这两种方法,但是都抛出了我不太明白的错误...

E.g <- E %>%
     group_by(ID) %>%
     mutate(E, stream = list( Date = seq(as.Date(first_pay), as.Date(last_pay)), Pay = Amt))
Error: impossible to replicate vector of size 3

E.e <- E %>%
     group_by(ID) %>% 
     summarise(contract_len = as.numeric(difftime(last(Date), first(Date), unit="days")),
               first_pay = first(Date),
               last_pay =last(Date),
               flightpath = list(d=Date, p=Amt),
               num_payments = n(),
               payment = sum(Amt))
Error: expecting a single value

我的 hacky 临时解决方案是:

a = dplyr::filter(E, ID == 'A')
b = dplyr::filter(E, ID == 'B')
c = dplyr::filter(E, ID == 'c')

x.a = list(Date = a$Date,Pay = a$Amt)
x.b = list(Date = b$Date,Pay = b$Amt)
x.c = list(Date = c$Date,Pay = c$Amt)


x.a
$Date
[1] "2000-07-01" "2000-07-05" "2000-07-09"

$Pay
[1] 6.663256 7.345490 7.864848

E.e$stream = list(a,b,c)

E.e
Source: local data frame [3 x 7]

  ID contract_len  first_pay   last_pay num_payments  payment      stream
1  A            8 2000-07-01 2000-07-09            3 21.87359 <S3:data.frame>
2  B            8 2000-07-02 2000-07-10            4 41.67682 <S3:data.frame>
3  c            4 2000-07-04 2000-07-08            3 19.83109 <S3:data.frame>

但我显然不能对我的完整数据集中的所有 1834 个唯一合约 ID 执行此操作,我认为我应该可以使用 dplyr 执行此操作...

不太确定你为什么想要这个,但你可以:

library(data.table)
dt = as.data.table(E) # or convert in place using setDT

dt[, .(contract_len = as.numeric(difftime(Date[.N], Date[1], unit = 'days')),
       first_pay = Date[1],
       last_pay = Date[.N],
       num_payments = .N,
       payment = sum(Amt),
       summary = list(data.table(Date, Amt)))
   , by = ID]
#   ID contract_len  first_pay   last_pay num_payments  payment      summary
#1:  A            8 2000-07-01 2000-07-09            3 33.44106 <data.table>
#2:  B            8 2000-07-02 2000-07-10            4 37.83217 <data.table>
#3:  c            4 2000-07-04 2000-07-08            3 26.30531 <data.table>

这就是 summary 列打印出来的样子:

#[[1]]
#         Date       Amt
#1: 2000-07-01 12.565032
#2: 2000-07-05 14.377863
#3: 2000-07-09  6.498166
#
#[[2]]
#         Date       Amt
#1: 2000-07-02  8.905060
#2: 2000-07-03 10.496663
#3: 2000-07-06  9.989162
#4: 2000-07-10  8.441285
#
#[[3]]
#         Date       Amt
#1: 2000-07-04  6.271645
#2: 2000-07-07  9.937350
#3: 2000-07-08 10.096318