python 中的 Smith-Waterman 实施
Smith-Waterman Implement in python
我想在 python 中用基本函数编写 Smith-Waterman 算法的第一部分。
我找到了这个 example,但它没有给我想要的东西。
def zeros(X: int, Y: int):
# ^ ^ incorrect type annotations. should be str
lenX = len(X) + 1
lenY = len(Y) + 1
matrix = []
for i in range(lenX):
matrix.append([0] * lenY)
# A more "pythonic" way of expressing the above would be:
# matrix = [[0] * len(Y) + 1 for _ in range(len(x) + 1)]
def score(X, Y):
# ^ ^ shadowing variables from outer scope. this is not a bug per se but it's considered bad practice
if X[n] == Y[m]: return 4
# ^ ^ variables not defined in scope
if X[n] == '-' or Y[m] == '-': return -4
# ^ ^ variables not defined in scope
else: return -2
def SmithWaterman(X, Y, score): # this function is never called
# ^ unnecessary function passed as parameter. function is defined in scope
for n in range(1, len(X) + 1):
for m in range(1, len(Y) + 1):
align = matrix[n-1, m-1] + (score(X[n-1], Y[m-1]))
# ^ invalid list lookup. should be: matrix[n-1][m-1]
indelX = matrix[n-1, m] + (score(X[n-1], Y[m]))
# ^ out of bounds error when m == len(Y)
indelY = matrix[n, m-1] + (score(X[n], Y[m-1]))
# ^ out of bounds error when n == len(X)
matrix[n, m] = max(align, indelX, indelY, 0)
# this should be nested in the inner for-loop. m, n, indelX, and indelY are not defined in scope here
print(matrix)
zeros("ACGT", "ACGT")
在书上找到了这个算法,但是我无法正确实现。
input: sequences s and t, with |s| =n, |t| = m, score function, penality InDel
匹配 +1,不匹配 -2,InDel -1
M = matrix of size n+1 * m+1
M[i,j] = 0
i=j=0
请帮忙
谢谢
图像算法实现:
M = []
for i in range(n):
M.append([])
for j in range(m):
first = max(M[i - 1][j - 1] + score(s[i], t[j])
second = M[i - 1][j] + penal
third = M[i][j - 1] + penal
M[i].append(first, second, third, 0))
但是您必须修复边缘情况(超出范围)并添加一些默认值。
你提供的代码的问题在那段代码的注释中有很好的描述。
假设你想要一个2分的线性gap-penalty,并且你只寻找第一阶段算法(因此不包括回溯过程),代码可以固定如下:
def score(x, y):
return 4 if x == y else (
-4 if '-' in (x, y) else -2
)
def zeros(a, b):
penalty = 2 # linear penalty (see Wikipedia)
nextrow = [0] * (len(b) + 1)
matrix = [nextrow]
for valA in a:
row, nextrow = nextrow, [0]
for m, valB in enumerate(b):
nextrow.append(max(
row[m] + score(valA, valB),
row[m+1] - penalty,
nextrow[m] - penalty,
0
))
matrix.append(nextrow)
return matrix
# Example run:
result = zeros("ACGT", "AC-GT")
print(result)
我想在 python 中用基本函数编写 Smith-Waterman 算法的第一部分。
我找到了这个 example,但它没有给我想要的东西。
def zeros(X: int, Y: int):
# ^ ^ incorrect type annotations. should be str
lenX = len(X) + 1
lenY = len(Y) + 1
matrix = []
for i in range(lenX):
matrix.append([0] * lenY)
# A more "pythonic" way of expressing the above would be:
# matrix = [[0] * len(Y) + 1 for _ in range(len(x) + 1)]
def score(X, Y):
# ^ ^ shadowing variables from outer scope. this is not a bug per se but it's considered bad practice
if X[n] == Y[m]: return 4
# ^ ^ variables not defined in scope
if X[n] == '-' or Y[m] == '-': return -4
# ^ ^ variables not defined in scope
else: return -2
def SmithWaterman(X, Y, score): # this function is never called
# ^ unnecessary function passed as parameter. function is defined in scope
for n in range(1, len(X) + 1):
for m in range(1, len(Y) + 1):
align = matrix[n-1, m-1] + (score(X[n-1], Y[m-1]))
# ^ invalid list lookup. should be: matrix[n-1][m-1]
indelX = matrix[n-1, m] + (score(X[n-1], Y[m]))
# ^ out of bounds error when m == len(Y)
indelY = matrix[n, m-1] + (score(X[n], Y[m-1]))
# ^ out of bounds error when n == len(X)
matrix[n, m] = max(align, indelX, indelY, 0)
# this should be nested in the inner for-loop. m, n, indelX, and indelY are not defined in scope here
print(matrix)
zeros("ACGT", "ACGT")
在书上找到了这个算法,但是我无法正确实现。
input: sequences s and t, with |s| =n, |t| = m, score function, penality InDel
匹配 +1,不匹配 -2,InDel -1
M = matrix of size n+1 * m+1
M[i,j] = 0
i=j=0
请帮忙 谢谢
图像算法实现:
M = []
for i in range(n):
M.append([])
for j in range(m):
first = max(M[i - 1][j - 1] + score(s[i], t[j])
second = M[i - 1][j] + penal
third = M[i][j - 1] + penal
M[i].append(first, second, third, 0))
但是您必须修复边缘情况(超出范围)并添加一些默认值。
你提供的代码的问题在那段代码的注释中有很好的描述。
假设你想要一个2分的线性gap-penalty,并且你只寻找第一阶段算法(因此不包括回溯过程),代码可以固定如下:
def score(x, y):
return 4 if x == y else (
-4 if '-' in (x, y) else -2
)
def zeros(a, b):
penalty = 2 # linear penalty (see Wikipedia)
nextrow = [0] * (len(b) + 1)
matrix = [nextrow]
for valA in a:
row, nextrow = nextrow, [0]
for m, valB in enumerate(b):
nextrow.append(max(
row[m] + score(valA, valB),
row[m+1] - penalty,
nextrow[m] - penalty,
0
))
matrix.append(nextrow)
return matrix
# Example run:
result = zeros("ACGT", "AC-GT")
print(result)