聚合级别的马尔可夫链(多序列)
Markov chain for in aggregrated level (multiple seuquence)
假设我有三个序列:
dat <- list( Seq1 =c("A", "B", "C", "D", "C", "A", "C","D","A","A","B","D"),
Seq2 = c("C" ,"C" ,"B" ,"A" ,"D" ,"D" ,"A" ,"B","C","D","B","A","D"),
Seq3 = c("D" ,"A" ,"D" ,"A" ,"D", "B", "B", "A","D","A","D","A"))
这些序列存储在三个不同的 CSV 文件中。我想根据这些数据[汇总]计算一阶马尔可夫链。
t=matrix(nrow = length(actionsoverall),ncol = length(actionsoverall),0)
for(i in files){
y=read.csv(i)$x
yy=as.integer(y)
for (j in 1:(length(y)-1)) {
t[yy[j],yy[t+1]]<-t[yy[j],yy[j+1]]+1
}
}
for (h in 1:length(actionsoverall)) {
t[h,]<-t[h,]/sum(t[h,])
}
实际上,我想从每个文件中读取序列(即 A 到 B 在文件 1 中出现 2 次,在文件 2 中出现 1 次,在文件 3 中出现 3 次。A 总共出现 10 次。所以,概率将是 6/10.
N.B。如果我计算每个文件的转移概率并对它们进行平均。会一样吗?
数据构建:
dat <- list( seq1 =c( "A", "B", "C","D","C","A", "C","D","A","A","B","D"),
seq2 =c( "C","C","B","A","D","D","A","B","C","D","B","A","D"),
seq3 = c("D","A","D","A","C","C","B","A","D","C","D","A"))
这将为您提供第一个订单转换计数:
lapply( dat, function(s) table( s, # start
c(s[-1],NA) # next
) ) )
#look at matrix( c( s, c(s[-1],NA) ), ncol=2) to verify
$seq1
s A B C D
A 1 2 1 0
B 0 0 1 1
C 1 0 0 2
D 1 0 1 0
$seq2
s A B C D
A 0 1 0 2
B 2 0 1 0
C 0 1 1 1
D 1 1 0 1
$seq3
s A B C D
A 0 0 1 2
B 1 0 0 0
C 0 1 1 1
D 3 0 1 0
这将累加那些没有平均的计数:
Reduce( "+", lapply( dat, function(s) table( s, c(s[-1],NA) ) ) )
s A B C D
A 1 3 2 4
B 3 0 2 1
C 1 2 2 4
D 5 1 2 1
这可能是从该结果中获取转换矩阵的一种方法:
prop.table(
Reduce( "+", lapply( dat, function(s) table( s, c(s[-1],NA) ) ) )
, 1) # specifies row-proportions
s A B C D
A 0.1000000 0.3000000 0.2000000 0.4000000
B 0.5000000 0.0000000 0.3333333 0.1666667
C 0.1111111 0.2222222 0.2222222 0.4444444
D 0.5555556 0.1111111 0.2222222 0.1111111
这是新策略:
newdat <- do.call('rbind', lapply(lapply( dat, function(s) table( s,
c(s[-1],NA)
) ) , as.data.frame))
str(newdat)
'data.frame': 41 obs. of 3 variables:
$ s : Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ...
$ Var2: Factor w/ 4 levels "A","B","C","D": 1 1 1 1 2 2 2 2 3 3 ...
$ Freq: int 1 0 1 1 2 0 0 0 1 1 ...
使用 newdat-object 可以简单地使用 xtabs 对 s
和 Var2
特征进行制表以获得总和:
> xtabs( Freq ~ s + Var2, newdat)
Var2
s A B C D
A 1 3 1 6
B 3 1 2 1
C 1 1 1 3
D 6 2 1 1
然后重做 prop.table
-操作以获得行比例。
prop.table(xtabs( Freq ~ s + Var2, newdat), 1)
#---------
Var2
s A B C D
A 0.09090909 0.27272727 0.09090909 0.54545455
B 0.42857143 0.14285714 0.28571429 0.14285714
C 0.16666667 0.16666667 0.16666667 0.50000000
D 0.60000000 0.20000000 0.10000000 0.10000000
假设我有三个序列:
dat <- list( Seq1 =c("A", "B", "C", "D", "C", "A", "C","D","A","A","B","D"),
Seq2 = c("C" ,"C" ,"B" ,"A" ,"D" ,"D" ,"A" ,"B","C","D","B","A","D"),
Seq3 = c("D" ,"A" ,"D" ,"A" ,"D", "B", "B", "A","D","A","D","A"))
这些序列存储在三个不同的 CSV 文件中。我想根据这些数据[汇总]计算一阶马尔可夫链。
t=matrix(nrow = length(actionsoverall),ncol = length(actionsoverall),0)
for(i in files){
y=read.csv(i)$x
yy=as.integer(y)
for (j in 1:(length(y)-1)) {
t[yy[j],yy[t+1]]<-t[yy[j],yy[j+1]]+1
}
}
for (h in 1:length(actionsoverall)) {
t[h,]<-t[h,]/sum(t[h,])
}
实际上,我想从每个文件中读取序列(即 A 到 B 在文件 1 中出现 2 次,在文件 2 中出现 1 次,在文件 3 中出现 3 次。A 总共出现 10 次。所以,概率将是 6/10.
N.B。如果我计算每个文件的转移概率并对它们进行平均。会一样吗?
数据构建:
dat <- list( seq1 =c( "A", "B", "C","D","C","A", "C","D","A","A","B","D"),
seq2 =c( "C","C","B","A","D","D","A","B","C","D","B","A","D"),
seq3 = c("D","A","D","A","C","C","B","A","D","C","D","A"))
这将为您提供第一个订单转换计数:
lapply( dat, function(s) table( s, # start
c(s[-1],NA) # next
) ) )
#look at matrix( c( s, c(s[-1],NA) ), ncol=2) to verify
$seq1
s A B C D
A 1 2 1 0
B 0 0 1 1
C 1 0 0 2
D 1 0 1 0
$seq2
s A B C D
A 0 1 0 2
B 2 0 1 0
C 0 1 1 1
D 1 1 0 1
$seq3
s A B C D
A 0 0 1 2
B 1 0 0 0
C 0 1 1 1
D 3 0 1 0
这将累加那些没有平均的计数:
Reduce( "+", lapply( dat, function(s) table( s, c(s[-1],NA) ) ) )
s A B C D
A 1 3 2 4
B 3 0 2 1
C 1 2 2 4
D 5 1 2 1
这可能是从该结果中获取转换矩阵的一种方法:
prop.table(
Reduce( "+", lapply( dat, function(s) table( s, c(s[-1],NA) ) ) )
, 1) # specifies row-proportions
s A B C D
A 0.1000000 0.3000000 0.2000000 0.4000000
B 0.5000000 0.0000000 0.3333333 0.1666667
C 0.1111111 0.2222222 0.2222222 0.4444444
D 0.5555556 0.1111111 0.2222222 0.1111111
这是新策略:
newdat <- do.call('rbind', lapply(lapply( dat, function(s) table( s,
c(s[-1],NA)
) ) , as.data.frame))
str(newdat)
'data.frame': 41 obs. of 3 variables:
$ s : Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ...
$ Var2: Factor w/ 4 levels "A","B","C","D": 1 1 1 1 2 2 2 2 3 3 ...
$ Freq: int 1 0 1 1 2 0 0 0 1 1 ...
使用 newdat-object 可以简单地使用 xtabs 对 s
和 Var2
特征进行制表以获得总和:
> xtabs( Freq ~ s + Var2, newdat)
Var2
s A B C D
A 1 3 1 6
B 3 1 2 1
C 1 1 1 3
D 6 2 1 1
然后重做 prop.table
-操作以获得行比例。
prop.table(xtabs( Freq ~ s + Var2, newdat), 1)
#---------
Var2
s A B C D
A 0.09090909 0.27272727 0.09090909 0.54545455
B 0.42857143 0.14285714 0.28571429 0.14285714
C 0.16666667 0.16666667 0.16666667 0.50000000
D 0.60000000 0.20000000 0.10000000 0.10000000