数据框的复杂重塑,跟踪记录编辑

Complex Reshaping of Data Frame, tracking record edits

我有一个数据框,可以跟踪项目记录何时被编辑、编辑时间、编辑的字段、旧值和新值。 OrderDateProbabilityTotal 列显示这些字段今天的值:

df.raw <- data.frame(project=rep(c('A','B'), each=4),
               createDate=as.Date(rep(c('2015-01-01','2017-05-01'), each=4)),
               editDate=as.Date(c('2018-06-01','2019-04-01','2019-05-01','2019-06-01', '2018-10-01','2018-11-01','2018-11-15','2019-01-01')), 
               editField=c('OrderDate', 'OrderDate','Probability','Probability', 'Total','Total', 'Probability','Total'),
               oldValue=c('2018-06-01','2019-05-01',20,30,500,550,70,400),
               newValue=c('2019-05-01','2019-06-01',30,50,550,400,30,450),
               OrderDate=as.Date(rep(c('2019-06-01','2019-01-01'), each=4)),
               Probability=rep(c(50,70), each=4),
               Total=rep(c(10,450), each=4))

  project createDate   editDate   editField   oldValue   newValue  OrderDate Probability Total
1       A 2015-01-01 2018-06-01   OrderDate 2018-06-01 2019-05-01 2019-06-01          50    10
2       A 2015-01-01 2019-04-01   OrderDate 2019-05-01 2019-06-01 2019-06-01          50    10
3       A 2015-01-01 2019-05-01 Probability         20         30 2019-06-01          50    10
4       A 2015-01-01 2019-06-01 Probability         30         50 2019-06-01          50    10
5       B 2017-05-01 2018-10-01       Total        500        550 2019-01-01          70   450
6       B 2017-05-01 2018-11-01       Total        550        400 2019-01-01          70   450
7       B 2017-05-01 2018-11-15 Probability         70         30 2019-01-01          70   450
8       B 2017-05-01 2019-01-01       Total        400        450 2019-01-01          70   450

我想转换此数据框以便:

最终结果看起来像这样:

df.reshaped <- data.frame(project=rep(c('A','B'), each=5),
           editDate=as.Date(c('2015-01-01','2018-06-01','2019-04-01','2019-05-01','2019-06-01', '2017-05-01', '2018-10-01','2018-11-01','2018-11-15','2019-01-01')),
           editField=c('Created','OrderDate', 'OrderDate','Probability','Probability','Created', 'Total','Total', 'Probability','Total'),
           OrderDateAtEdit=as.Date(c('2018-06-01','2019-05-01','2019-06-01','2019-06-01','2019-06-01',rep('2019-01-01', 5))),
           ProbabilityAtEdit=c(20,20,20,30,50,70,70,70,30,30),
           TotalAtEdit=c(10,10,10,10,10,500,550,400,400,450))
   project   editDate   editField OrderDateAtEdit ProbabilityAtEdit TotalAtEdit
1        A 2015-01-01     Created      2018-06-01                20          10
2        A 2018-06-01   OrderDate      2019-05-01                20          10
3        A 2019-04-01   OrderDate      2019-06-01                20          10
4        A 2019-05-01 Probability      2019-06-01                30          10
5        A 2019-06-01 Probability      2019-06-01                50          10
6        B 2017-05-01     Created      2019-01-01                70         500
7        B 2018-10-01       Total      2019-01-01                70         550
8        B 2018-11-01       Total      2019-01-01                70         400
9        B 2018-11-15 Probability      2019-01-01                30         400
10       B 2019-01-01       Total      2019-01-01                30         450

我不知道从哪里开始,如有任何帮助,我们将不胜感激!谢谢。

我认为数据已合并在一起,您需要将它们拆分为一个事件 table 和一个编辑 table:

library(data.table)
setDT(df.raw)

#created the events table with the available values first
cols <- c("OrderDate", "Probability", "Total")
events <- df.raw[, setnames(rbindlist(.(.(createDate[1L], "Created"), 
    .(editDate, editField))), c("editDate","editField")), project]
events[unique(df.raw, by=c("project", "Probability", "Total")), on=.(project), 
    paste0(cols, "AtEdit") := lapply(mget(cols), as.character)]

#historical edits in another table
edits <- df.raw[, .(startDate=c(createDate[1L], editDate), 
    endDate=c(editDate, as.Date("9999-12-31")),
    value=c(oldValue, newValue[.N])), .(project, editField)]

#perform non-equi joins to update events table
for (x in cols) {
    cn <- paste0(x, "AtEdit")
    v <- edits[editField==x][events, on=.(project, startDate<=editDate, endDate>editDate), value] 
    events[, (cn) := fifelse(is.na(v), get(cn), as.character(v))]  
}

输出:

    project   editDate   editField OrderDateAtEdit ProbabilityAtEdit TotalAtEdit
 1:       A 2015-01-01     Created      2018-06-01                20          10
 2:       A 2018-06-01   OrderDate      2019-05-01                20          10
 3:       A 2019-04-01   OrderDate      2019-06-01                20          10
 4:       A 2019-05-01 Probability      2019-06-01                30          10
 5:       A 2019-06-01 Probability      2019-06-01                50          10
 6:       B 2017-05-01     Created      2019-01-01                70         500
 7:       B 2018-10-01       Total      2019-01-01                70         550
 8:       B 2018-11-01       Total      2019-01-01                70         400
 9:       B 2018-11-15 Probability      2019-01-01                30         400
10:       B 2019-01-01       Total      2019-01-01                30         450