随机游走得到不好的结果
Get bad result for random walk
我想实现随机游走并计算稳态。
假设我的图表如下图所示:
上图在文件中定义如下:
1 2 0.9
1 3 0.1
2 1 0.8
2 2 0.1
2 4 0.1
etc
要读取和构建此图,我使用以下方法:
def _build_og(self, original_ppi):
""" Build the original graph, without any nodes removed. """
try:
graph_fp = open(original_ppi, 'r')
except IOError:
sys.exit("Could not open file: {}".format(original_ppi))
G = nx.DiGraph()
edge_list = []
# parse network input
for line in graph_fp.readlines():
split_line = line.rstrip().split('\t')
# assume input graph is a simple edgelist with weights
edge_list.append((split_line[0], split_line[1], float(split_line[2])))
G.add_weighted_edges_from(edge_list)
graph_fp.close()
print edge_list
return G
在上面的函数中,我需要将图形定义为 DiGraph 还是 simpy Graph?
我们构建转换矩阵如下:
def _build_matrices(self, original_ppi, low_list, remove_nodes):
""" Build column-normalized adjacency matrix for each graph.
NOTE: these are column-normalized adjacency matrices (not nx
graphs), used to compute each p-vector
"""
original_graph = self._build_og(original_ppi)
self.OG = original_graph
og_not_normalized = nx.to_numpy_matrix(original_graph)
self.og_matrix = self._normalize_cols(og_not_normalized)
然后我使用 :
对矩阵进行归一化
def _normalize_cols(self, matrix):
""" Normalize the columns of the adjacency matrix """
return normalize(matrix, norm='l1', axis=0)
现在模拟我们定义的随机游走:
def run_exp(self, source):
CONV_THRESHOLD = 0.000001
# set up the starting probability vector
p_0 = self._set_up_p0(source)
diff_norm = 1
# this needs to be a deep copy, since we're reusing p_0 later
p_t = np.copy(p_0)
while (diff_norm > CONV_THRESHOLD):
# first, calculate p^(t + 1) from p^(t)
p_t_1 = self._calculate_next_p(p_t, p_0)
# calculate L1 norm of difference between p^(t + 1) and p^(t),
# for checking the convergence condition
diff_norm = np.linalg.norm(np.subtract(p_t_1, p_t), 1)
# then, set p^(t) = p^(t + 1), and loop again if necessary
# no deep copy necessary here, we're just renaming p
p_t = p_t_1
我们使用以下方法定义初始状态(p_0):
def _set_up_p0(self, source):
""" Set up and return the 0th probability vector. """
p_0 = [0] * self.OG.number_of_nodes()
# convert self.OG.number_of_nodes() to list
l = list(self.OG.nodes())
#nx.draw(self.OG, with_labels=True)
#plt.show()
for source_id in source:
try:
# matrix columns are in the same order as nodes in original nx
# graph, so we can get the index of the source node from the OG
source_index = l.index(source_id)
p_0[source_index] = 1 / float(len(source))
except ValueError:
sys.exit("Source node {} is not in original graph. Source: {}. Exiting.".format(source_id, source))
return np.array(p_0)
为了生成下一个状态,我们使用下面的函数
和幂迭代策略:
def _calculate_next_p(self, p_t, p_0):
""" Calculate the next probability vector. """
print 'p_0\t{}'.format(p_0)
print 'p_t\t{}'.format(p_t)
epsilon = np.squeeze(np.asarray(np.dot(self.og_matrix, p_t)))
print 'epsilon\t{}'.format(epsilon)
print 10*"*"
return np.array(epsilon)
假设随机游走可以从任何节点(1、2、3 或 4)开始。
运行代码时,我得到以下结果:
2 0.32
3 0.31
1 0.25
4 0.11
结果必须是:
(0.28, 0.30, 0.04, 0.38).
所以有人可以帮我检测我的错误在哪里吗?
我不知道问题是否出在我的转换矩阵中。
矩阵应该是这样的(假设你的转移矩阵乘以左边的状态向量,它是一个左随机矩阵,其中列加起来为 1 ,(i, j)
条目是从 j
到 i
的概率)。
import numpy as np
transition = np.array([[0, 0.8, 0, 0.1], [0.9, 0.1, 0.5, 0], [0.1, 0, 0.3, 0], [0, 0.1, 0.2, 0.9]])
state = np.array([1, 0, 0, 0]) # could be any other initial position
diff = tol = 0.001
while diff >= tol:
next_state = transition.dot(state)
diff = np.linalg.norm(next_state - state, ord=np.inf)
state = next_state
print(np.around(state, 3))
这会打印 [0.279 0.302 0.04 0.378]
。
我不知道你是加载数据不正确,还是其他原因。 "column normalization" 的步骤是一个警告标志:如果给定的转换概率加起来不等于 1,您应该报告错误数据,而不是对列进行标准化。而且我不知道当数据已经作为矩阵呈现时你为什么要使用 NetworkX:你得到的 table 可以读作
column row entry
这个矩阵就是计算所需要的。
我想实现随机游走并计算稳态。
假设我的图表如下图所示:
上图在文件中定义如下:
1 2 0.9
1 3 0.1
2 1 0.8
2 2 0.1
2 4 0.1
etc
要读取和构建此图,我使用以下方法:
def _build_og(self, original_ppi):
""" Build the original graph, without any nodes removed. """
try:
graph_fp = open(original_ppi, 'r')
except IOError:
sys.exit("Could not open file: {}".format(original_ppi))
G = nx.DiGraph()
edge_list = []
# parse network input
for line in graph_fp.readlines():
split_line = line.rstrip().split('\t')
# assume input graph is a simple edgelist with weights
edge_list.append((split_line[0], split_line[1], float(split_line[2])))
G.add_weighted_edges_from(edge_list)
graph_fp.close()
print edge_list
return G
在上面的函数中,我需要将图形定义为 DiGraph 还是 simpy Graph?
我们构建转换矩阵如下:
def _build_matrices(self, original_ppi, low_list, remove_nodes):
""" Build column-normalized adjacency matrix for each graph.
NOTE: these are column-normalized adjacency matrices (not nx
graphs), used to compute each p-vector
"""
original_graph = self._build_og(original_ppi)
self.OG = original_graph
og_not_normalized = nx.to_numpy_matrix(original_graph)
self.og_matrix = self._normalize_cols(og_not_normalized)
然后我使用 :
对矩阵进行归一化def _normalize_cols(self, matrix):
""" Normalize the columns of the adjacency matrix """
return normalize(matrix, norm='l1', axis=0)
现在模拟我们定义的随机游走:
def run_exp(self, source):
CONV_THRESHOLD = 0.000001
# set up the starting probability vector
p_0 = self._set_up_p0(source)
diff_norm = 1
# this needs to be a deep copy, since we're reusing p_0 later
p_t = np.copy(p_0)
while (diff_norm > CONV_THRESHOLD):
# first, calculate p^(t + 1) from p^(t)
p_t_1 = self._calculate_next_p(p_t, p_0)
# calculate L1 norm of difference between p^(t + 1) and p^(t),
# for checking the convergence condition
diff_norm = np.linalg.norm(np.subtract(p_t_1, p_t), 1)
# then, set p^(t) = p^(t + 1), and loop again if necessary
# no deep copy necessary here, we're just renaming p
p_t = p_t_1
我们使用以下方法定义初始状态(p_0):
def _set_up_p0(self, source):
""" Set up and return the 0th probability vector. """
p_0 = [0] * self.OG.number_of_nodes()
# convert self.OG.number_of_nodes() to list
l = list(self.OG.nodes())
#nx.draw(self.OG, with_labels=True)
#plt.show()
for source_id in source:
try:
# matrix columns are in the same order as nodes in original nx
# graph, so we can get the index of the source node from the OG
source_index = l.index(source_id)
p_0[source_index] = 1 / float(len(source))
except ValueError:
sys.exit("Source node {} is not in original graph. Source: {}. Exiting.".format(source_id, source))
return np.array(p_0)
为了生成下一个状态,我们使用下面的函数
和幂迭代策略:
def _calculate_next_p(self, p_t, p_0):
""" Calculate the next probability vector. """
print 'p_0\t{}'.format(p_0)
print 'p_t\t{}'.format(p_t)
epsilon = np.squeeze(np.asarray(np.dot(self.og_matrix, p_t)))
print 'epsilon\t{}'.format(epsilon)
print 10*"*"
return np.array(epsilon)
假设随机游走可以从任何节点(1、2、3 或 4)开始。
运行代码时,我得到以下结果:
2 0.32
3 0.31
1 0.25
4 0.11
结果必须是:
(0.28, 0.30, 0.04, 0.38).
所以有人可以帮我检测我的错误在哪里吗?
我不知道问题是否出在我的转换矩阵中。
矩阵应该是这样的(假设你的转移矩阵乘以左边的状态向量,它是一个左随机矩阵,其中列加起来为 1 ,(i, j)
条目是从 j
到 i
的概率)。
import numpy as np
transition = np.array([[0, 0.8, 0, 0.1], [0.9, 0.1, 0.5, 0], [0.1, 0, 0.3, 0], [0, 0.1, 0.2, 0.9]])
state = np.array([1, 0, 0, 0]) # could be any other initial position
diff = tol = 0.001
while diff >= tol:
next_state = transition.dot(state)
diff = np.linalg.norm(next_state - state, ord=np.inf)
state = next_state
print(np.around(state, 3))
这会打印 [0.279 0.302 0.04 0.378]
。
我不知道你是加载数据不正确,还是其他原因。 "column normalization" 的步骤是一个警告标志:如果给定的转换概率加起来不等于 1,您应该报告错误数据,而不是对列进行标准化。而且我不知道当数据已经作为矩阵呈现时你为什么要使用 NetworkX:你得到的 table 可以读作
column row entry
这个矩阵就是计算所需要的。