如何修改此 data.table 代码以显示余额转换而不是事件频率转换?

How to modify this data.table code to show balance transitions instead of event frequency transitions?

我正在使用下面的 MWE 代码生成过渡频率的数据帧。它运作良好且快速。我是 data.table 包的新手,无法将其转换为显示平衡转换。

首先,下面是示例数据框,运行函数时的转换频率输出(使用“Period_1”和“Period_2”的两次时间测量) ,以及这些函数的底层 MWE 代码,所有这些代码都按转换频率的预期工作:

> data
   ID Period_1 Period_2 Values State
1:  1        1  2020-01      5    X0
2:  1        2  2020-02     10    X1
3:  1        3  2020-03     15    X2
4:  2        1  2020-04      0    X0
5:  2        2  2020-05      2    X2
6:  2        3  2020-06      4    X0
7:  3        1  2020-02      3    X2
8:  3        2  2020-03      6    X1
9:  3        3  2020-04      9    X0

> setDT(data)
> num_transit(data, "2020-02", "2020-04",refvar="Period_2")
   to_state X0 X1 X2
1:       X0 NA NA  1
2:       X1 NA NA NA
3:       X2 NA NA NA

> setDT(data)
> num_transit(data, 1,3, refvar="Period_1")
   to_state X0 X1 X2
1:       X0  1 NA  1
2:       X1 NA NA NA
3:       X2  1 NA NA

library(data.table)
   
data <- 
  data.frame(
    ID = c(1,1,1,2,2,2,3,3,3),
    Period_1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
    Period_2 = c("2020-01","2020-02","2020-03","2020-04","2020-05","2020-06","2020-02","2020-03","2020-04"),
    Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
    State = c("X0","X1","X2","X0","X2","X0", "X2","X1","X0")
  )
    
num_transit <- function(x,from,to,refvar="Period_2", return_matrix=T) {
  res <- x[get(refvar) %in% c(to,from), if(.N>1) .SD, by=ID, .SDcols = c(refvar, "State")]
  res <- res[, id:=1:.N, by=ID]
  res <- dcast(res, ID~id, value.var="State")[,.N, .(`1`,`2`)]
  setnames(res,c("from","to", "ct"))
  if(return_matrix) return(convert_transits_to_matrix(res, unique(x$State)))
  res
}
    
convert_transits_to_matrix <- function(transits,states) {
  m = matrix(NA, nrow=length(states), ncol=length(states), dimnames=list(states,states))
  m[as.matrix(transits[,.(to,from)])] <- transits$ct
  m = data.table(m)[,to_state:=rownames(m)]
  setcolorder(m,"to_state")
  return(m[])
}

这是我需要帮助的地方。我正在尝试修改上面的内容(称之为“val_transit”)以显示“值”到新状态的转换。所以输出看起来像这样,使用 data 数据帧和 运行 Period_2 从 1 到 3(或 val_transit(data, 1,3, refvar="Period_1"))的转换:

   to_state X0 X1 X2
1:       X0  4 NA  9
2:       X1 NA NA NA
3:       X2 15 NA NA

有什么建议吗?这是过渡频率 post

的后续

当然,这是对之前 num_transit 功能的更新。注意差异

  1. .SDcols 在函数的第一行包含 StateValues
  2. dcast 调用中的
  3. value.vars 包括 StateValue
  4. 作为上面 (2) 的结果,我在 State_1State_2 上明确分组,而不是 12,总结操作是求和 Values
  5. 如果return_matrix=F
  6. setnames调用调整为return最后一列为Values
val_transit <- function(x,from,to,refvar="Period_2", return_matrix=T) {
  res <- x[get(refvar) %in% c(to,from), if(.N>1) .SD, by=ID, .SDcols = c(refvar, "State", "Values")]
  res <- res[, id:=1:.N, by=ID]
  res <- dcast(res, ID~id, value.var=c("State", "Values"))[,.(Values=sum(Values_2,na.rm=T)), .(State_1, State_2)]
  setnames(res,c("from","to", "Values"))
  if(return_matrix) return(convert_transits_to_matrix(res, unique(x$State)))
  res
}

注意下面,我对我的 convert_transits_to_matrix 函数做了一个小更新,这样这个辅助函数就可以同时用于 val_transit()num_transit()。次要更新在第 2 行,我在这里使用 transits[[3]],因此无论 transits 对象中的实际第 3 列名称如何,它都能正常工作。

convert_transits_to_matrix <- function(transits,states) {
  m = matrix(NA, nrow=length(states), ncol=length(states), dimnames=list(states,states))
  m[as.matrix(transits[,.(to,from)])] <- transits[[3]]
  m = data.table(m)[,to_state:=rownames(m)]
  setcolorder(m,"to_state")
  return(m[])
}

用法:

val_transit(data,"2020-02","2020-04", "Period_2")
   to_state    X0    X1    X2
     <char> <num> <num> <num>
1:       X0    NA    NA     9
2:       X1    NA    NA    NA
3:       X2    NA    NA    NA

val_transit(data,1,3, "Period_1")

   to_state    X0    X1    X2
     <char> <num> <num> <num>
1:       X0     4    NA     9
2:       X1    NA    NA    NA
3:       X2    15    NA    NA

确保您的 data 是 setDT(data),然后再将其提供给这些函数。