每天汇总不同日期的借方和贷方金额并按账户分组

Aggregate debit and credit amounts with different dates on a daily basis and group by accounts

我有一个 table,其中包含借方金额、贷方金额、借方日期、贷方日期和帐户 ID。只要有借方金额条目,贷方金额将为空,反之亦然。我需要每天的借方和贷方总和。

id Debit_date Debit_amount Credit_date Credit_amount
1 2018-10-21 20000 NA NA
1 NA NA 2018-10-21 50000
2 2019-1-2 10000 NA NA
2 2019-1-3 20000 NA NA
4 NA NA 2019-1-4 30000
1 2019-1-5 1000 NA NA

我需要得到以下输出:

id Trans_date Total_debit Total_credit
1 2018-10-21 20000 50000
1 2019-1-5 1000 NA
2 2019-1-2 30000 NA
4 2019-1-4 NA 30000

我尝试了以下代码:

df_db = df %>%  group_by(id,debit_date) %>% summarise(total_debit=sum(debit_amount))
df_cr = df %>%  group_by(id,credit_date) %>% summarise(total_credit=sum(credit_amount))

然后我继续加入这两个数据框,但它只是炸毁了它,因为我有数百万笔交易。谁能指导我如何获取上面输出中的数据。非常感谢。

您可以使用 coalesce 按日期分组:

df %>% 
  group_by(id, Trans_date = coalesce(Debit_date, Credit_date)) %>% 
  summarise(Total_debit = sum(Debit_amount, na.rm = T),
            Total_credit = sum(Credit_amount, na.rm = T))

     id Trans_date Total_debit Total_credit
1     1 2018-10-21       20000        50000
2     1 2019-1-5          1000            0
3     2 2019-1-2         30000            0
4     4 2019-1-4             0        30000

数据(我调整了第五行的 Date 以匹配预期输出)

structure(list(id = c(1L, 1L, 2L, 2L, 4L, 1L), Debit_date = c("2018-10-21", 
NA, "2019-1-2", "2019-1-2", NA, "2019-1-5"), Debit_amount = c(20000L, 
NA, 10000L, 20000L, NA, 1000L), Credit_date = c(NA, "2018-10-21", 
NA, NA, "2019-1-4", NA), Credit_amount = c(NA, 50000L, NA, NA, 
30000L, NA)), class = "data.frame", row.names = c(NA, -6L))