使用 R 聚合、排序和计算加权平均值
Aggregate, sort and calculate weighted average using R
我有两个数据框。一个是针对发票的详细信息 (df_inv),另一个是针对发票的 collection 详细信息 (df_coll)。
一张发票可能有多个 collections/vouchers。
发票 table 大约有 30 列,目前我们只检查 3 列用于此计算(发票编号、预期金额、到期日)
类似地 collections table 有多个变量,对于这种情况我们考虑 3 列(发票号、凭证日期、贷方金额)
PS:一张 300 美元的发票可能会在 3 个不同的日期通过 3 张代金券(每张 100 美元)支付。贷记金额也可能小于或大于预期金额。
根据发票table中的发票编号(唯一),我需要从collectiontable中找到其对应的凭证,根据凭证日期升序排列,找到付款延迟 (df_coll$VoucherDate - df_inv$DueDate),然后计算每张发票的加权平均值。
x4 in df_inv,在 df_coll 中没有相应的条目。因此它将 return NA
加权平均计算(1 张发票和 2 张付款凭证):
((1st pymt amt* 1st delay days)+ (2nd pymt amt* 2nd delay days))/((% of total credited amount)*(expected amount))
下面的示例数据,
发票Table(df_inv)
Invoice No Expected Amount Due Date
x1 1400 02-01-2012
x2 850 20-04-2012
x3 1300 30-09-2012
x4 1500 25-01-2013
Collections Table(df_coll)
Invoice No Voucher Date Credit Amount
x1 26-11-2012 100
x2 24-10-2012 200
x1 11-05-2012 300
x1 22-08-2013 100
x2 12-07-2013 500
x3 30-01-2014 600
x2 24-06-2012 100
x3 31-11-2012 700
x1 29-02-2012 800
这是一个仅使用基础 R 的可能解决方案:
#################### Recreate your input data.frame's ##################
df_inv <-
data.frame(InvoiceNo=c("x1","x2","x3","x4"),
Expected=c(1400,850,1300,1500),
AmountDueDate=c("02-01-2012","20-04-2012","30-09-2012","25-01-2013"),
stringsAsFactors=FALSE)
df_coll <-
data.frame(InvoiceNo=c("x1","x2","x1","x1","x2","x3","x2","x3","x1"),
VoucherDate=c("26-11-2012","24-10-2012","11-05-2012","22-08-2013",
"12-07-2013","30-01-2014","24-06-2012","30-11-2012","29-02-2012"),
CreditAmount=c(100,200,300,100,500,600,100,700,800),
stringsAsFactors=FALSE)
df_inv$AmountDueDate <- as.Date(df_inv$AmountDueDate,format='%d-%m-%Y')
df_coll$VoucherDate <- as.Date(df_coll$VoucherDate,format='%d-%m-%Y')
###########################################################################
m <- merge(df_inv,df_coll,by="InvoiceNo",all.x=TRUE,all.y=FALSE)
m$CrdAmntWeighted <- m$CreditAmount * as.numeric(m$VoucherDate - m$AmountDueDate)
m$TotCredAmnt <- ave(m$CreditAmount,m$InvoiceNo,FUN=sum)
m$TotCrdAmntWeighted <- ave(m$CrdAmntWeighted,m$InvoiceNo,FUN=sum)
m$WeightedAvg <- m$TotCrdAmntWeighted / ((m$TotCredAmnt / m$Expected) * m$Expected)
final <- m[!duplicated(m$InvoiceNo),c('InvoiceNo','Expected','TotCredAmnt','WeightedAvg')]
> final
InvoiceNo Expected TotCredAmnt WeightedAvg
1 x1 1400 1300 137.0000
5 x2 850 800 334.8750
8 x3 1300 1300 257.6154
10 x4 1500 NA NA
我有两个数据框。一个是针对发票的详细信息 (df_inv),另一个是针对发票的 collection 详细信息 (df_coll)。 一张发票可能有多个 collections/vouchers。 发票 table 大约有 30 列,目前我们只检查 3 列用于此计算(发票编号、预期金额、到期日) 类似地 collections table 有多个变量,对于这种情况我们考虑 3 列(发票号、凭证日期、贷方金额) PS:一张 300 美元的发票可能会在 3 个不同的日期通过 3 张代金券(每张 100 美元)支付。贷记金额也可能小于或大于预期金额。 根据发票table中的发票编号(唯一),我需要从collectiontable中找到其对应的凭证,根据凭证日期升序排列,找到付款延迟 (df_coll$VoucherDate - df_inv$DueDate),然后计算每张发票的加权平均值。
x4 in df_inv,在 df_coll 中没有相应的条目。因此它将 return NA
加权平均计算(1 张发票和 2 张付款凭证):
((1st pymt amt* 1st delay days)+ (2nd pymt amt* 2nd delay days))/((% of total credited amount)*(expected amount))
下面的示例数据,
发票Table(df_inv)
Invoice No Expected Amount Due Date
x1 1400 02-01-2012
x2 850 20-04-2012
x3 1300 30-09-2012
x4 1500 25-01-2013
Collections Table(df_coll)
Invoice No Voucher Date Credit Amount
x1 26-11-2012 100
x2 24-10-2012 200
x1 11-05-2012 300
x1 22-08-2013 100
x2 12-07-2013 500
x3 30-01-2014 600
x2 24-06-2012 100
x3 31-11-2012 700
x1 29-02-2012 800
这是一个仅使用基础 R 的可能解决方案:
#################### Recreate your input data.frame's ##################
df_inv <-
data.frame(InvoiceNo=c("x1","x2","x3","x4"),
Expected=c(1400,850,1300,1500),
AmountDueDate=c("02-01-2012","20-04-2012","30-09-2012","25-01-2013"),
stringsAsFactors=FALSE)
df_coll <-
data.frame(InvoiceNo=c("x1","x2","x1","x1","x2","x3","x2","x3","x1"),
VoucherDate=c("26-11-2012","24-10-2012","11-05-2012","22-08-2013",
"12-07-2013","30-01-2014","24-06-2012","30-11-2012","29-02-2012"),
CreditAmount=c(100,200,300,100,500,600,100,700,800),
stringsAsFactors=FALSE)
df_inv$AmountDueDate <- as.Date(df_inv$AmountDueDate,format='%d-%m-%Y')
df_coll$VoucherDate <- as.Date(df_coll$VoucherDate,format='%d-%m-%Y')
###########################################################################
m <- merge(df_inv,df_coll,by="InvoiceNo",all.x=TRUE,all.y=FALSE)
m$CrdAmntWeighted <- m$CreditAmount * as.numeric(m$VoucherDate - m$AmountDueDate)
m$TotCredAmnt <- ave(m$CreditAmount,m$InvoiceNo,FUN=sum)
m$TotCrdAmntWeighted <- ave(m$CrdAmntWeighted,m$InvoiceNo,FUN=sum)
m$WeightedAvg <- m$TotCrdAmntWeighted / ((m$TotCredAmnt / m$Expected) * m$Expected)
final <- m[!duplicated(m$InvoiceNo),c('InvoiceNo','Expected','TotCredAmnt','WeightedAvg')]
> final
InvoiceNo Expected TotCredAmnt WeightedAvg
1 x1 1400 1300 137.0000
5 x2 850 800 334.8750
8 x3 1300 1300 257.6154
10 x4 1500 NA NA