如何获取一列合约支付对列表(日期:金额)?
How to get a column of lists of contract payout pairs (date:amounts)?
我正在尝试将一列列表对象添加到 data.frame 的付款方式中
ID <- c("A", "B", "B", "c", "A", "B", "c", "c", "A", "B")
Date = seq(as.Date("2000/07/01"), as.Date("2000/07/10"), "days")
Amt <- rnorm(10, 10, 3)
E <- data.frame(Date = Date, ID = ID, Amt = Amt)
Date ID Amt
1 2000-07-01 A 6.663256
2 2000-07-02 B 17.084491
3 2000-07-03 B 8.644242
4 2000-07-04 c 4.729045
5 2000-07-05 A 7.345490
6 2000-07-06 B 4.678909
7 2000-07-07 c 8.907506
8 2000-07-08 c 6.194540
9 2000-07-09 A 7.864848
10 2000-07-10 B 11.269177
首先,我使用 dplyr 构建了几列所需的摘要:
E.e <- E %>%
group_by(ID) %>%
summarise(contract_len = as.numeric(difftime(last(Date), first(Date), unit="days")),
first_pay = first(Date),
last_pay = last(Date),
num_payments = n(),
payment = sum(Amt))
E.e
Source: local data frame [3 x 6]
ID contract_len first_pay last_pay num_payments payment
1 A 8 2000-07-01 2000-07-09 3 21.87359
2 B 8 2000-07-02 2000-07-10 4 41.67682
3 c 4 2000-07-04 2000-07-08 3 19.83109
现在,我尝试添加一列键值对列表,其中键是给定 ID 付款的日期,值是描述该日期付款的数字对象。
我已经尝试了这两种方法,但是都抛出了我不太明白的错误...
E.g <- E %>%
group_by(ID) %>%
mutate(E, stream = list( Date = seq(as.Date(first_pay), as.Date(last_pay)), Pay = Amt))
Error: impossible to replicate vector of size 3
E.e <- E %>%
group_by(ID) %>%
summarise(contract_len = as.numeric(difftime(last(Date), first(Date), unit="days")),
first_pay = first(Date),
last_pay =last(Date),
flightpath = list(d=Date, p=Amt),
num_payments = n(),
payment = sum(Amt))
Error: expecting a single value
我的 hacky 临时解决方案是:
a = dplyr::filter(E, ID == 'A')
b = dplyr::filter(E, ID == 'B')
c = dplyr::filter(E, ID == 'c')
x.a = list(Date = a$Date,Pay = a$Amt)
x.b = list(Date = b$Date,Pay = b$Amt)
x.c = list(Date = c$Date,Pay = c$Amt)
x.a
$Date
[1] "2000-07-01" "2000-07-05" "2000-07-09"
$Pay
[1] 6.663256 7.345490 7.864848
E.e$stream = list(a,b,c)
E.e
Source: local data frame [3 x 7]
ID contract_len first_pay last_pay num_payments payment stream
1 A 8 2000-07-01 2000-07-09 3 21.87359 <S3:data.frame>
2 B 8 2000-07-02 2000-07-10 4 41.67682 <S3:data.frame>
3 c 4 2000-07-04 2000-07-08 3 19.83109 <S3:data.frame>
但我显然不能对我的完整数据集中的所有 1834 个唯一合约 ID 执行此操作,我认为我应该可以使用 dplyr 执行此操作...
不太确定你为什么想要这个,但你可以:
library(data.table)
dt = as.data.table(E) # or convert in place using setDT
dt[, .(contract_len = as.numeric(difftime(Date[.N], Date[1], unit = 'days')),
first_pay = Date[1],
last_pay = Date[.N],
num_payments = .N,
payment = sum(Amt),
summary = list(data.table(Date, Amt)))
, by = ID]
# ID contract_len first_pay last_pay num_payments payment summary
#1: A 8 2000-07-01 2000-07-09 3 33.44106 <data.table>
#2: B 8 2000-07-02 2000-07-10 4 37.83217 <data.table>
#3: c 4 2000-07-04 2000-07-08 3 26.30531 <data.table>
这就是 summary
列打印出来的样子:
#[[1]]
# Date Amt
#1: 2000-07-01 12.565032
#2: 2000-07-05 14.377863
#3: 2000-07-09 6.498166
#
#[[2]]
# Date Amt
#1: 2000-07-02 8.905060
#2: 2000-07-03 10.496663
#3: 2000-07-06 9.989162
#4: 2000-07-10 8.441285
#
#[[3]]
# Date Amt
#1: 2000-07-04 6.271645
#2: 2000-07-07 9.937350
#3: 2000-07-08 10.096318
我正在尝试将一列列表对象添加到 data.frame 的付款方式中
ID <- c("A", "B", "B", "c", "A", "B", "c", "c", "A", "B")
Date = seq(as.Date("2000/07/01"), as.Date("2000/07/10"), "days")
Amt <- rnorm(10, 10, 3)
E <- data.frame(Date = Date, ID = ID, Amt = Amt)
Date ID Amt
1 2000-07-01 A 6.663256
2 2000-07-02 B 17.084491
3 2000-07-03 B 8.644242
4 2000-07-04 c 4.729045
5 2000-07-05 A 7.345490
6 2000-07-06 B 4.678909
7 2000-07-07 c 8.907506
8 2000-07-08 c 6.194540
9 2000-07-09 A 7.864848
10 2000-07-10 B 11.269177
首先,我使用 dplyr 构建了几列所需的摘要:
E.e <- E %>%
group_by(ID) %>%
summarise(contract_len = as.numeric(difftime(last(Date), first(Date), unit="days")),
first_pay = first(Date),
last_pay = last(Date),
num_payments = n(),
payment = sum(Amt))
E.e
Source: local data frame [3 x 6]
ID contract_len first_pay last_pay num_payments payment
1 A 8 2000-07-01 2000-07-09 3 21.87359
2 B 8 2000-07-02 2000-07-10 4 41.67682
3 c 4 2000-07-04 2000-07-08 3 19.83109
现在,我尝试添加一列键值对列表,其中键是给定 ID 付款的日期,值是描述该日期付款的数字对象。
我已经尝试了这两种方法,但是都抛出了我不太明白的错误...
E.g <- E %>%
group_by(ID) %>%
mutate(E, stream = list( Date = seq(as.Date(first_pay), as.Date(last_pay)), Pay = Amt))
Error: impossible to replicate vector of size 3
E.e <- E %>%
group_by(ID) %>%
summarise(contract_len = as.numeric(difftime(last(Date), first(Date), unit="days")),
first_pay = first(Date),
last_pay =last(Date),
flightpath = list(d=Date, p=Amt),
num_payments = n(),
payment = sum(Amt))
Error: expecting a single value
我的 hacky 临时解决方案是:
a = dplyr::filter(E, ID == 'A')
b = dplyr::filter(E, ID == 'B')
c = dplyr::filter(E, ID == 'c')
x.a = list(Date = a$Date,Pay = a$Amt)
x.b = list(Date = b$Date,Pay = b$Amt)
x.c = list(Date = c$Date,Pay = c$Amt)
x.a
$Date
[1] "2000-07-01" "2000-07-05" "2000-07-09"
$Pay
[1] 6.663256 7.345490 7.864848
E.e$stream = list(a,b,c)
E.e
Source: local data frame [3 x 7]
ID contract_len first_pay last_pay num_payments payment stream
1 A 8 2000-07-01 2000-07-09 3 21.87359 <S3:data.frame>
2 B 8 2000-07-02 2000-07-10 4 41.67682 <S3:data.frame>
3 c 4 2000-07-04 2000-07-08 3 19.83109 <S3:data.frame>
但我显然不能对我的完整数据集中的所有 1834 个唯一合约 ID 执行此操作,我认为我应该可以使用 dplyr 执行此操作...
不太确定你为什么想要这个,但你可以:
library(data.table)
dt = as.data.table(E) # or convert in place using setDT
dt[, .(contract_len = as.numeric(difftime(Date[.N], Date[1], unit = 'days')),
first_pay = Date[1],
last_pay = Date[.N],
num_payments = .N,
payment = sum(Amt),
summary = list(data.table(Date, Amt)))
, by = ID]
# ID contract_len first_pay last_pay num_payments payment summary
#1: A 8 2000-07-01 2000-07-09 3 33.44106 <data.table>
#2: B 8 2000-07-02 2000-07-10 4 37.83217 <data.table>
#3: c 4 2000-07-04 2000-07-08 3 26.30531 <data.table>
这就是 summary
列打印出来的样子:
#[[1]]
# Date Amt
#1: 2000-07-01 12.565032
#2: 2000-07-05 14.377863
#3: 2000-07-09 6.498166
#
#[[2]]
# Date Amt
#1: 2000-07-02 8.905060
#2: 2000-07-03 10.496663
#3: 2000-07-06 9.989162
#4: 2000-07-10 8.441285
#
#[[3]]
# Date Amt
#1: 2000-07-04 6.271645
#2: 2000-07-07 9.937350
#3: 2000-07-08 10.096318