在 Python 中生成马尔可夫转移矩阵
Generating Markov transition matrix in Python
假设我有一系列 4 种可能的马尔可夫状态(A、B、C、D):
X = [A, B, B, C, B, A, D, D, A, B, A, D, ....]
如何使用Python生成马尔可夫变换矩阵?矩阵必须是 4 x 4,显示从每个状态移动到其他 3 个状态的概率。
我一直在网上看很多例子,但在所有这些例子中,矩阵都是给定的,而不是根据数据计算出来的。
我也研究了 hmmlearn,但我没有读到如何让它吐出转换矩阵。有没有我可以用于此目的的图书馆?
这是我在 Python 中尝试做的确切事情的 R 代码:
https://stats.stackexchange.com/questions/26722/calculate-transition-matrix-markov-in-r
这可能会给您一些想法:
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
def rank(c):
return ord(c) - ord('A')
T = [rank(c) for c in transitions]
#create matrix of zeros
M = [[0]*4 for _ in range(4)]
for (i,j) in zip(T,T[1:]):
M[i][j] += 1
#now convert to probabilities:
for row in M:
n = sum(row)
if n > 0:
row[:] = [f/sum(row) for f in row]
#print M:
for row in M:
print(row)
输出:
[0.0, 0.5, 0.0, 0.5]
[0.5, 0.25, 0.25, 0.0]
[0.0, 1.0, 0.0, 0.0]
[0.5, 0.0, 0.0, 0.5]
On Edit 下面是实现上述想法的函数:
#the following code takes a list such as
#[1,1,2,6,8,5,5,7,8,8,1,1,4,5,5,0,0,0,1,1,4,4,5,1,3,3,4,5,4,1,1]
#with states labeled as successive integers starting with 0
#and returns a transition matrix, M,
#where M[i][j] is the probability of transitioning from i to j
def transition_matrix(transitions):
n = 1+ max(transitions) #number of states
M = [[0]*n for _ in range(n)]
for (i,j) in zip(transitions,transitions[1:]):
M[i][j] += 1
#now convert to probabilities:
for row in M:
s = sum(row)
if s > 0:
row[:] = [f/s for f in row]
return M
#test:
t = [1,1,2,6,8,5,5,7,8,8,1,1,4,5,5,0,0,0,1,1,4,4,5,1,3,3,4,5,4,1,1]
m = transition_matrix(t)
for row in m: print(' '.join('{0:.2f}'.format(x) for x in row))
输出:
0.67 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.50 0.12 0.12 0.25 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00
0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.00 0.00
0.00 0.20 0.00 0.00 0.20 0.60 0.00 0.00 0.00
0.17 0.17 0.00 0.00 0.17 0.33 0.00 0.17 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
0.00 0.33 0.00 0.00 0.00 0.33 0.00 0.00 0.33
如果您想在 pandas 中完成所有操作,这里有一种适用于非数字数据的方法:
import pandas as pd
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
df = pd.DataFrame(transitions)
# create a new column with data shifted one space
df['shift'] = df[0].shift(-1)
# add a count column (for group by function)
df['count'] = 1
# groupby and then unstack, fill the zeros
trans_mat = df.groupby([0, 'shift']).count().unstack().fillna(0)
# normalise by occurences and save values to get transition matrix
trans_mat = trans_mat.div(trans_mat.sum(axis=1), axis=0).values
它比纯粹的 python 方法慢,但为了灵活性和避免创建您自己的函数可能值得。
下面的代码提供了关于马尔可夫转移矩阵阶数1的另一种解决方案。您的数据可以是整数列表、字符串列表或字符串。消极的想法是这个解决方案 - 很可能 - 需要时间和记忆。
- 创建 1 阶马尔可夫转移矩阵(二元组)
- 生成 1000 个整数以将马尔可夫转移矩阵训练到数据集。
- 训练马尔可夫转移矩阵
至此我们有了问题的答案。下面的代码尝试解决一个额外的问题。具体来说就是根据训练好的马尔可夫任务生成数据。
- 将马尔可夫转移矩阵的概率转换为累积(算术编码)
- 正在生成 30 条数据
import pandas as pd
def transition_matrix_order1(data):
alphabet = []
for element in data:
if element not in alphabet:
alphabet.append(element)
alphabet.sort()
previous = data[0]
matrix = pd.DataFrame(0.0, index=alphabet, columns=alphabet)
for i in data[1:]:
matrix[i][previous] += 1.0
previous = i
total = matrix.sum()
for element in alphabet:
matrix[element] = matrix.div(total[element])[element]
return matrix, alphabet
#create data using random integers========
import random
data = [random.randint(1,5) for i in range(1000)] #You can also put list of strings or a string as input data
#create markov transition matrix order 1 (bigram)
markov_matrix, alphabet = transition_matrix_order1(data)
#=the following code uses the probabilities in order to create new data.=
#transform probabilities of markov transition matrix to cumulative
for column in alphabet:
for pos, index in enumerate(alphabet[1:]):
markov_matrix[column][index] += markov_matrix[column][alphabet[pos]]
#generating 30 data
generated_data = []
feed = random.choice(alphabet)
generated_data.append(feed)
for i in range(30):
random_value = random.uniform(0, 1)
for i in alphabet:
if markov_matrix[feed][i] >= random_value:
generated_data.append(i)
feed = i
break
print(generated_data)
假设我有一系列 4 种可能的马尔可夫状态(A、B、C、D):
X = [A, B, B, C, B, A, D, D, A, B, A, D, ....]
如何使用Python生成马尔可夫变换矩阵?矩阵必须是 4 x 4,显示从每个状态移动到其他 3 个状态的概率。 我一直在网上看很多例子,但在所有这些例子中,矩阵都是给定的,而不是根据数据计算出来的。 我也研究了 hmmlearn,但我没有读到如何让它吐出转换矩阵。有没有我可以用于此目的的图书馆?
这是我在 Python 中尝试做的确切事情的 R 代码: https://stats.stackexchange.com/questions/26722/calculate-transition-matrix-markov-in-r
这可能会给您一些想法:
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
def rank(c):
return ord(c) - ord('A')
T = [rank(c) for c in transitions]
#create matrix of zeros
M = [[0]*4 for _ in range(4)]
for (i,j) in zip(T,T[1:]):
M[i][j] += 1
#now convert to probabilities:
for row in M:
n = sum(row)
if n > 0:
row[:] = [f/sum(row) for f in row]
#print M:
for row in M:
print(row)
输出:
[0.0, 0.5, 0.0, 0.5]
[0.5, 0.25, 0.25, 0.0]
[0.0, 1.0, 0.0, 0.0]
[0.5, 0.0, 0.0, 0.5]
On Edit 下面是实现上述想法的函数:
#the following code takes a list such as
#[1,1,2,6,8,5,5,7,8,8,1,1,4,5,5,0,0,0,1,1,4,4,5,1,3,3,4,5,4,1,1]
#with states labeled as successive integers starting with 0
#and returns a transition matrix, M,
#where M[i][j] is the probability of transitioning from i to j
def transition_matrix(transitions):
n = 1+ max(transitions) #number of states
M = [[0]*n for _ in range(n)]
for (i,j) in zip(transitions,transitions[1:]):
M[i][j] += 1
#now convert to probabilities:
for row in M:
s = sum(row)
if s > 0:
row[:] = [f/s for f in row]
return M
#test:
t = [1,1,2,6,8,5,5,7,8,8,1,1,4,5,5,0,0,0,1,1,4,4,5,1,3,3,4,5,4,1,1]
m = transition_matrix(t)
for row in m: print(' '.join('{0:.2f}'.format(x) for x in row))
输出:
0.67 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.50 0.12 0.12 0.25 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00
0.00 0.00 0.00 0.50 0.50 0.00 0.00 0.00 0.00
0.00 0.20 0.00 0.00 0.20 0.60 0.00 0.00 0.00
0.17 0.17 0.00 0.00 0.17 0.33 0.00 0.17 0.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
0.00 0.33 0.00 0.00 0.00 0.33 0.00 0.00 0.33
如果您想在 pandas 中完成所有操作,这里有一种适用于非数字数据的方法:
import pandas as pd
transitions = ['A', 'B', 'B', 'C', 'B', 'A', 'D', 'D', 'A', 'B', 'A', 'D']
df = pd.DataFrame(transitions)
# create a new column with data shifted one space
df['shift'] = df[0].shift(-1)
# add a count column (for group by function)
df['count'] = 1
# groupby and then unstack, fill the zeros
trans_mat = df.groupby([0, 'shift']).count().unstack().fillna(0)
# normalise by occurences and save values to get transition matrix
trans_mat = trans_mat.div(trans_mat.sum(axis=1), axis=0).values
它比纯粹的 python 方法慢,但为了灵活性和避免创建您自己的函数可能值得。
下面的代码提供了关于马尔可夫转移矩阵阶数1的另一种解决方案。您的数据可以是整数列表、字符串列表或字符串。消极的想法是这个解决方案 - 很可能 - 需要时间和记忆。
- 创建 1 阶马尔可夫转移矩阵(二元组)
- 生成 1000 个整数以将马尔可夫转移矩阵训练到数据集。
- 训练马尔可夫转移矩阵
至此我们有了问题的答案。下面的代码尝试解决一个额外的问题。具体来说就是根据训练好的马尔可夫任务生成数据。
- 将马尔可夫转移矩阵的概率转换为累积(算术编码)
- 正在生成 30 条数据
import pandas as pd
def transition_matrix_order1(data):
alphabet = []
for element in data:
if element not in alphabet:
alphabet.append(element)
alphabet.sort()
previous = data[0]
matrix = pd.DataFrame(0.0, index=alphabet, columns=alphabet)
for i in data[1:]:
matrix[i][previous] += 1.0
previous = i
total = matrix.sum()
for element in alphabet:
matrix[element] = matrix.div(total[element])[element]
return matrix, alphabet
#create data using random integers========
import random
data = [random.randint(1,5) for i in range(1000)] #You can also put list of strings or a string as input data
#create markov transition matrix order 1 (bigram)
markov_matrix, alphabet = transition_matrix_order1(data)
#=the following code uses the probabilities in order to create new data.=
#transform probabilities of markov transition matrix to cumulative
for column in alphabet:
for pos, index in enumerate(alphabet[1:]):
markov_matrix[column][index] += markov_matrix[column][alphabet[pos]]
#generating 30 data
generated_data = []
feed = random.choice(alphabet)
generated_data.append(feed)
for i in range(30):
random_value = random.uniform(0, 1)
for i in alphabet:
if markov_matrix[feed][i] >= random_value:
generated_data.append(i)
feed = i
break
print(generated_data)