在 python 中实现用于局部对齐的 Smith-Waterman 算法
Implementing Smith-Waterman algorithm for local alignment in python
我创建了一个序列比对工具来比较两条 DNA 链(X 和 Y),以找到 X 和 Y 的子串的最佳比对。算法总结如下(https://en.wikipedia.org/wiki/Smith–Waterman_algorithm)。我已经能够生成一个列表列表,用零填充它们,以表示我的矩阵。我创建了一个评分算法,以 return 对碱基之间的每种比对进行数值评分(例如,匹配加 4)。然后我创建了一个对齐算法,应该在我的“矩阵”的每个坐标中放置一个分数。但是,当我去打印矩阵时,它只 return 是全零的原始值(而不是实际分数)。
我知道还有其他方法可以实现此方法(例如使用 numpy),那么您能告诉我为什么这个特定代码(下面)不起作用吗?有没有办法修改它,使其正常工作?
代码:
def zeros(X: int, Y: int):
lenX = len(X) + 1
lenY = len(Y) + 1
matrix = []
for i in range(lenX):
matrix.append([0] * lenY)
def score(X, Y):
if X[n] == Y[m]: return 4
if X[n] == '-' or Y[m] == '-': return -4
else: return -2
def SmithWaterman(X, Y, score):
for n in range(1, len(X) + 1):
for m in range(1, len(Y) + 1):
align = matrix[n-1, m-1] + (score(X[n-1], Y[m-1]))
indelX = matrix[n-1, m] + (score(X[n-1], Y[m]))
indelY = matrix[n, m-1] + (score(X[n], Y[m-1]))
matrix[n, m] = max(align, indelX, indelY, 0)
print(matrix)
zeros("ACGT", "ACGT")
输出:
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
它只是打印出归零矩阵的原因是 SmithWaterman
函数从未被调用,所以矩阵永远不会更新。
你需要做类似的事情
# ...
SmithWaterman(X, Y, score)
print(matrix)
# ...
但是,如果您这样做,您会发现这段代码实际上在许多其他方面都存在问题。我已经完成并注释了一些语法错误和代码的其他问题:
def zeros(X: int, Y: int):
# ^ ^ incorrect type annotations. should be str
lenX = len(X) + 1
lenY = len(Y) + 1
matrix = []
for i in range(lenX):
matrix.append([0] * lenY)
# A more "pythonic" way of expressing the above would be:
# matrix = [[0] * len(Y) + 1 for _ in range(len(x) + 1)]
def score(X, Y):
# ^ ^ shadowing variables from outer scope. this is not a bug per se but it's considered bad practice
if X[n] == Y[m]: return 4
# ^ ^ variables not defined in scope
if X[n] == '-' or Y[m] == '-': return -4
# ^ ^ variables not defined in scope
else: return -2
def SmithWaterman(X, Y, score): # this function is never called
# ^ unnecessary function passed as parameter. function is defined in scope
for n in range(1, len(X) + 1):
for m in range(1, len(Y) + 1):
align = matrix[n-1, m-1] + (score(X[n-1], Y[m-1]))
# ^ invalid list lookup. should be: matrix[n-1][m-1]
indelX = matrix[n-1, m] + (score(X[n-1], Y[m]))
# ^ out of bounds error when m == len(Y)
indelY = matrix[n, m-1] + (score(X[n], Y[m-1]))
# ^ out of bounds error when n == len(X)
matrix[n, m] = max(align, indelX, indelY, 0)
# this should be nested in the inner for-loop. m, n, indelX, and indelY are not defined in scope here
print(matrix)
zeros("ACGT", "ACGT")
我创建了一个序列比对工具来比较两条 DNA 链(X 和 Y),以找到 X 和 Y 的子串的最佳比对。算法总结如下(https://en.wikipedia.org/wiki/Smith–Waterman_algorithm)。我已经能够生成一个列表列表,用零填充它们,以表示我的矩阵。我创建了一个评分算法,以 return 对碱基之间的每种比对进行数值评分(例如,匹配加 4)。然后我创建了一个对齐算法,应该在我的“矩阵”的每个坐标中放置一个分数。但是,当我去打印矩阵时,它只 return 是全零的原始值(而不是实际分数)。
我知道还有其他方法可以实现此方法(例如使用 numpy),那么您能告诉我为什么这个特定代码(下面)不起作用吗?有没有办法修改它,使其正常工作?
代码:
def zeros(X: int, Y: int):
lenX = len(X) + 1
lenY = len(Y) + 1
matrix = []
for i in range(lenX):
matrix.append([0] * lenY)
def score(X, Y):
if X[n] == Y[m]: return 4
if X[n] == '-' or Y[m] == '-': return -4
else: return -2
def SmithWaterman(X, Y, score):
for n in range(1, len(X) + 1):
for m in range(1, len(Y) + 1):
align = matrix[n-1, m-1] + (score(X[n-1], Y[m-1]))
indelX = matrix[n-1, m] + (score(X[n-1], Y[m]))
indelY = matrix[n, m-1] + (score(X[n], Y[m-1]))
matrix[n, m] = max(align, indelX, indelY, 0)
print(matrix)
zeros("ACGT", "ACGT")
输出:
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
它只是打印出归零矩阵的原因是 SmithWaterman
函数从未被调用,所以矩阵永远不会更新。
你需要做类似的事情
# ...
SmithWaterman(X, Y, score)
print(matrix)
# ...
但是,如果您这样做,您会发现这段代码实际上在许多其他方面都存在问题。我已经完成并注释了一些语法错误和代码的其他问题:
def zeros(X: int, Y: int):
# ^ ^ incorrect type annotations. should be str
lenX = len(X) + 1
lenY = len(Y) + 1
matrix = []
for i in range(lenX):
matrix.append([0] * lenY)
# A more "pythonic" way of expressing the above would be:
# matrix = [[0] * len(Y) + 1 for _ in range(len(x) + 1)]
def score(X, Y):
# ^ ^ shadowing variables from outer scope. this is not a bug per se but it's considered bad practice
if X[n] == Y[m]: return 4
# ^ ^ variables not defined in scope
if X[n] == '-' or Y[m] == '-': return -4
# ^ ^ variables not defined in scope
else: return -2
def SmithWaterman(X, Y, score): # this function is never called
# ^ unnecessary function passed as parameter. function is defined in scope
for n in range(1, len(X) + 1):
for m in range(1, len(Y) + 1):
align = matrix[n-1, m-1] + (score(X[n-1], Y[m-1]))
# ^ invalid list lookup. should be: matrix[n-1][m-1]
indelX = matrix[n-1, m] + (score(X[n-1], Y[m]))
# ^ out of bounds error when m == len(Y)
indelY = matrix[n, m-1] + (score(X[n], Y[m-1]))
# ^ out of bounds error when n == len(X)
matrix[n, m] = max(align, indelX, indelY, 0)
# this should be nested in the inner for-loop. m, n, indelX, and indelY are not defined in scope here
print(matrix)
zeros("ACGT", "ACGT")