如何创建一个 table 来衡量元素在日历周期内的转变?
How to create a table that measures transitions of elements over calendar periods?
我有一个转换 table 生成函数,它计算自元素首次出现以来元素状态随时间推移的转换(下面示例数据框中的“Period_1”),输出和代码如下所示:
library(data.table)
data <-
data.frame(
ID = c(1,1,1,2,2,2,3,3,3),
Period_1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
Period_2 = c("2020-01","2020-02","2020-03","2020-04","2020-05","2020-06","2020-02","2020-03","2020-04"),
Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
State = c("X0","X1","X2","X0","X2","X0", "X2","X1","X0")
)
numTransit <- function(x, from=1, to=3){
setDT(x)
unique_state <- unique(x$State)
all_states <- setDT(expand.grid(list(from_state = unique_state, to_state = unique_state)))
dcast(x[, .(from_state = State[from],
to_state = State[to]),
by = ID]
[,.N, c("from_state", "to_state")]
[all_states,on = c("from_state", "to_state")],
to_state ~ from_state, value.var = "N"
)
}
numTransit(data,1,3)
然而,在更完整的代码中部署它,我还试图让用户选择计算日历周期内的转换(数据框中的“Period_2”),其中如果用户希望查看从 2020-02 月到 2020-04 月的过渡,则输出将如下所示(因为从 2020-02 到 2020-04 期间只有一个元素 ID = 3,因此仅显示一个元素结果转换 table;并且该元素在此期间从状态 X2 移动到状态 X0):
> numTransit(data,"2020-02","2020-04")
to_state X0 X1 X2
1: X0 NA NA 1
2: X1 NA NA NA
3: X2 NA NA NA
有什么办法吗?我是 data.table()
的新手,但由于速度的原因我决定使用它,因为此函数是 运行 针对数百万行数据的,它会在几分之一秒内生成结果。 post 是 post
的后续扩展
这是 numTransit 函数的另一种定义。
(更新:我将 convert_to_matrix 移出此函数)
num_transit <- function(x,from,to,refvar="Period_2", return_matrix=T) {
res <- x[get(refvar) %in% c(to,from), if(.N>1) .SD, by=ID, .SDcols = c(refvar, "State")]
res <- res[, id:=1:.N, by=ID]
res <- dcast(res, ID~id, value.var="State")[,.N, .(`1`,`2`)]
setnames(res,c("from","to", "ct"))
if(return_matrix) return(convert_transits_to_matrix(res, unique(x$State)))
res
}
convert_transits_to_matrix <- function(transits,states) {
m = matrix(NA, nrow=length(states), ncol=length(states), dimnames=list(states,states))
m[as.matrix(transits[,.(to,from)])] <- transits$ct
m = data.table(m)[,to_state:=rownames(m)]
setcolorder(m,"to_state")
return(m[])
}
用法:
setDT(data)
num_transit(data, "2020-02", "2020-04")
to_state X0 X1 X2
<char> <int> <int> <int>
1: X0 NA NA 1
2: X1 NA NA NA
3: X2 NA NA NA
num_transit(data, 1,3, refvar="Period_1")
to_state X0 X1 X2
<char> <int> <int> <int>
1: X0 1 NA 1
2: X1 NA NA NA
3: X2 1 NA NA
我有一个转换 table 生成函数,它计算自元素首次出现以来元素状态随时间推移的转换(下面示例数据框中的“Period_1”),输出和代码如下所示:
library(data.table)
data <-
data.frame(
ID = c(1,1,1,2,2,2,3,3,3),
Period_1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
Period_2 = c("2020-01","2020-02","2020-03","2020-04","2020-05","2020-06","2020-02","2020-03","2020-04"),
Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
State = c("X0","X1","X2","X0","X2","X0", "X2","X1","X0")
)
numTransit <- function(x, from=1, to=3){
setDT(x)
unique_state <- unique(x$State)
all_states <- setDT(expand.grid(list(from_state = unique_state, to_state = unique_state)))
dcast(x[, .(from_state = State[from],
to_state = State[to]),
by = ID]
[,.N, c("from_state", "to_state")]
[all_states,on = c("from_state", "to_state")],
to_state ~ from_state, value.var = "N"
)
}
numTransit(data,1,3)
然而,在更完整的代码中部署它,我还试图让用户选择计算日历周期内的转换(数据框中的“Period_2”),其中如果用户希望查看从 2020-02 月到 2020-04 月的过渡,则输出将如下所示(因为从 2020-02 到 2020-04 期间只有一个元素 ID = 3,因此仅显示一个元素结果转换 table;并且该元素在此期间从状态 X2 移动到状态 X0):
> numTransit(data,"2020-02","2020-04")
to_state X0 X1 X2
1: X0 NA NA 1
2: X1 NA NA NA
3: X2 NA NA NA
有什么办法吗?我是 data.table()
的新手,但由于速度的原因我决定使用它,因为此函数是 运行 针对数百万行数据的,它会在几分之一秒内生成结果。 post 是 post
这是 numTransit 函数的另一种定义。
(更新:我将 convert_to_matrix 移出此函数)
num_transit <- function(x,from,to,refvar="Period_2", return_matrix=T) {
res <- x[get(refvar) %in% c(to,from), if(.N>1) .SD, by=ID, .SDcols = c(refvar, "State")]
res <- res[, id:=1:.N, by=ID]
res <- dcast(res, ID~id, value.var="State")[,.N, .(`1`,`2`)]
setnames(res,c("from","to", "ct"))
if(return_matrix) return(convert_transits_to_matrix(res, unique(x$State)))
res
}
convert_transits_to_matrix <- function(transits,states) {
m = matrix(NA, nrow=length(states), ncol=length(states), dimnames=list(states,states))
m[as.matrix(transits[,.(to,from)])] <- transits$ct
m = data.table(m)[,to_state:=rownames(m)]
setcolorder(m,"to_state")
return(m[])
}
用法:
setDT(data)
num_transit(data, "2020-02", "2020-04")
to_state X0 X1 X2
<char> <int> <int> <int>
1: X0 NA NA 1
2: X1 NA NA NA
3: X2 NA NA NA
num_transit(data, 1,3, refvar="Period_1")
to_state X0 X1 X2
<char> <int> <int> <int>
1: X0 1 NA 1
2: X1 NA NA NA
3: X2 1 NA NA