如何创建一个 table 来衡量元素在日历周期内的转变?

How to create a table that measures transitions of elements over calendar periods?

我有一个转换 table 生成函数,它计算自元素首次出现以来元素状态随时间推移的转换(下面示例数据框中的“Period_1”),输出和代码如下所示:

library(data.table)

data <- 
  data.frame(
    ID = c(1,1,1,2,2,2,3,3,3),
    Period_1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
    Period_2 = c("2020-01","2020-02","2020-03","2020-04","2020-05","2020-06","2020-02","2020-03","2020-04"),
    Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
    State = c("X0","X1","X2","X0","X2","X0", "X2","X1","X0")
  )

numTransit <- function(x, from=1, to=3){
  setDT(x)
  unique_state <- unique(x$State)
  all_states <- setDT(expand.grid(list(from_state = unique_state, to_state = unique_state)))
  dcast(x[, .(from_state = State[from], 
              to_state = State[to]), 
          by = ID]
        [,.N, c("from_state", "to_state")]
        [all_states,on = c("from_state", "to_state")], 
        to_state ~ from_state, value.var = "N"
  )
}

numTransit(data,1,3)

然而,在更完整的代码中部署它,我还试图让用户选择计算日历周期内的转换(数据框中的“Period_2”),其中如果用户希望查看从 2020-02 月到 2020-04 月的过渡,则输出将如下所示(因为从 2020-02 到 2020-04 期间只有一个元素 ID = 3,因此仅显示一个元素结果转换 table;并且该元素在此期间从状态 X2 移动到状态 X0):

> numTransit(data,"2020-02","2020-04")
   to_state X0 X1 X2
1:       X0 NA NA 1
2:       X1 NA NA NA
3:       X2 NA NA NA

有什么办法吗?我是 data.table() 的新手,但由于速度的原因我决定使用它,因为此函数是 运行 针对数百万行数据的,它会在几分之一秒内生成结果。 post 是 post

的后续扩展

这是 numTransit 函数的另一种定义。

(更新:我将 convert_to_matrix 移出此函数)

num_transit <- function(x,from,to,refvar="Period_2", return_matrix=T) {
  res <- x[get(refvar) %in% c(to,from), if(.N>1) .SD, by=ID, .SDcols = c(refvar, "State")]
  res <- res[, id:=1:.N, by=ID]
  res <- dcast(res, ID~id, value.var="State")[,.N, .(`1`,`2`)]
  setnames(res,c("from","to", "ct"))
  if(return_matrix) return(convert_transits_to_matrix(res, unique(x$State)))
  res
}

convert_transits_to_matrix <- function(transits,states) {
  m = matrix(NA, nrow=length(states), ncol=length(states), dimnames=list(states,states))
  m[as.matrix(transits[,.(to,from)])] <- transits$ct
  m = data.table(m)[,to_state:=rownames(m)]
  setcolorder(m,"to_state")
  return(m[])
}

用法:

setDT(data)
num_transit(data, "2020-02", "2020-04")

   to_state    X0    X1    X2
     <char> <int> <int> <int>
1:       X0    NA    NA     1
2:       X1    NA    NA    NA
3:       X2    NA    NA    NA

num_transit(data, 1,3, refvar="Period_1")

   to_state    X0    X1    X2
     <char> <int> <int> <int>
1:       X0     1    NA     1
2:       X1    NA    NA    NA
3:       X2     1    NA    NA