是否可以将编辑距离问题的每一个解都显示在动态规划算法得到的矩阵上?

Can every solution of the edit distance problem be shown on the matrix obtained from the dynamic programming algorithm?

我在 Youtube 上看了一个视频,解释了如何通过从动态规划算法得到的矩阵的最后一个单元格返回到 1 来计算使用了哪些操作以及使用了多少次。虽然我理解视频中的例子,但当我尝试用其他例子来做时,我做不到。是否可以在矩阵上显示每个解决方案? 我还在最后添加了计算矩阵的代码。

假设我们的单词是 "apple""paper".

动态规划算法得到的矩阵(我用的是Levensthein Distance运算。):

“apple”和“paper”的最小编辑距离为4。

其中一个解决方案是 4 次替换操作。

1. apple -> ppple(将“a”替换为“ p")

2. ppple -> paple(替换“p”用 "a")

3. paple -> papee(替换“l”用“e”)

4. papee -> paper(将“e”替换为“ r")

矩阵解:

是否可以在此矩阵上显示其他解决方案?

例如:

1. 苹果 -> 苹果(插入“p”)

2. 苹果 -> 苹果(删除“p”)

3. paple -> pape(删除“l”)

4. 纸 -> 纸(插入“r”)

代码:

def min_edit_count(word1, word2):

    word1 = ' ' + word1     #
    word2 = ' ' + word2     # add a space before original words

    len_w1 = len(word1)     #
    len_w2 = len(word2)     # calculate the lengths of new words

    edit_matrix = np.zeros((len_w2, len_w1), dtype = int)
    # create a matrix with all zeros

    edit_matrix[0, :] = range(len_w1)  
    # assign numbers from 0 to len_w1 in the first row of the edit_matrix 
    edit_matrix[:, 0] = range(len_w2)
    # assign numbers from 0 to len_w2 in the first column of the edit_matrix 

    for i in range(1, len_w2):
        for j in range(1, len_w1):
            # edit_matrix[i-1][j] --> remove
            # edit_matrix[i][j-1] --> insert
            # edit_matrix[i-1][j-1] --> replace

            temp1 = edit_matrix[i-1][j] + 1
            temp2 = edit_matrix[i][j-1] + 1
            # add 1 to edit_matrix[i-1][j] and edit_matrix[i][j-1]

            temp3 = edit_matrix[i-1][j-1] + 1 if word1[j] != word2[i] else edit_matrix[i-1][j-1]
            # if last characters are same don't add 1 to edit_matrix[i-1][j-1].
            # no need to replace

            edit_count = min(temp1, temp2, temp3)
            # find min between three numbers
            edit_matrix[i][j] = edit_count

    min_edit = int(edit_matrix[len_w2 - 1, len_w1 - 1])
    # minimum edit count is the last number calculated
    
    return min_edit, edit_matrix

是的,您可以从最后回溯到可能有助于解决方案的单元格。
复杂度为 O((n+m) * num_solutions).

def getSolutions(edit_matrix, word1, word2):
  pos = [] 
  def backtrack(i,j):
    pos.append((i,j))
    if i==0 and j==0:
      # This is a solution
      print(pos)
    if i>0 and edit_matrix[i-1,j] + 1 == edit_matrix[i,j]:
      backtrack(i-1,j)
    if j>0 and edit_matrix[i,j-1] + 1 == edit_matrix[i,j]:
      backtrack(i, j-1)
    if i>0 and j>0 and (edit_matrix[i-1][j-1] + 1 if word1[j] != word2[i] else edit_matrix[i-1][j-1]) == edit_matrix[i,j]:
      backtrack(i,j)
    pos.pop()
  backtrack(len(word1) - 1,len(word2) - 1)

基于 Sorin 的好回答,这里有一个稍微增强的版本,它被修复以适合您的索引,并且还打印了在每种情况下需要完成的编辑操作:

def get_solutions(edit_matrix, word1, word2):
  pos = []
  sol = []
  def backtrack(i,j):
    pos.insert(0, (i,j))
    if i==0 and j==0:
      # This is a solution
      print(str(pos) + ": " + str(sol))
    if i>0 and edit_matrix[i-1,j] + 1 == edit_matrix[i,j]:
      sol.insert(0, "insert " + word2[i])
      backtrack(i-1,j)
      sol.pop(0)
    if j>0 and edit_matrix[i,j-1] + 1 == edit_matrix[i,j]:
      sol.insert(0, "remove " + word1[j])
      backtrack(i, j-1)
      sol.pop(0);
    if i>0 and j>0 and edit_matrix[i-1][j-1] + 1 if word1[j] != word2[i] else edit_matrix[i-1][j-1] == edit_matrix[i,j]:
      if (word1[j] != word2[i]):
        sol.insert(0, "replace " + word1[j] + " with " + word2[i])
      else:
        sol.insert(0, "skip")
      backtrack(i-1,j-1)
      sol.pop(0)
    pos.pop(0)
  word1 = ' ' + word1
  word2 = ' ' + word2
  backtrack(len(word2) - 1,len(word1) - 1)

的示例输出
count, matrix = min_edit_count("apple", "paper")
get_solutions(matrix, "apple", "paper")

然后看起来如下:

[(0, 0), (1, 0), (2, 1), (3, 2), (3, 3), (3, 4), (4, 5), (5, 5)]: ['insert p', 'skip', 'skip', 'remove p', 'remove l', 'skip', 'insert r']
[(0, 0), (0, 1), (1, 2), (2, 2), (3, 3), (3, 4), (4, 5), (5, 5)]: ['remove a', 'skip', 'insert a', 'skip', 'remove l', 'skip', 'insert r']
[(0, 0), (1, 0), (2, 1), (2, 2), (3, 3), (3, 4), (4, 5), (5, 5)]: ['insert p', 'skip', 'remove p', 'skip', 'remove l', 'skip', 'insert r']
[(0, 0), (1, 1), (2, 2), (3, 3), (3, 4), (4, 5), (5, 5)]: ['replace a with p', 'replace p with a', 'skip', 'remove l', 'skip', 'insert r']
[(0, 0), (0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 5)]: ['remove a', 'skip', 'replace p with a', 'replace l with p', 'skip', 'insert r']
[(0, 0), (1, 0), (2, 1), (3, 2), (4, 3), (5, 4), (5, 5)]: ['insert p', 'skip', 'skip', 'replace p with e', 'replace l with r', 'remove e']
[(0, 0), (1, 0), (2, 1), (3, 2), (4, 3), (4, 4), (5, 5)]: ['insert p', 'skip', 'skip', 'replace p with e', 'remove l', 'replace e with r']
[(0, 0), (1, 0), (2, 1), (3, 2), (3, 3), (4, 4), (5, 5)]: ['insert p', 'skip', 'skip', 'remove p', 'replace l with e', 'replace e with r']
[(0, 0), (0, 1), (1, 2), (2, 2), (3, 3), (4, 4), (5, 5)]: ['remove a', 'skip', 'insert a', 'skip', 'replace l with e', 'replace e with r']
[(0, 0), (1, 0), (2, 1), (2, 2), (3, 3), (4, 4), (5, 5)]: ['insert p', 'skip', 'remove p', 'skip', 'replace l with e', 'replace e with r']
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5)]: ['replace a with p', 'replace p with a', 'skip', 'replace l with e', 'replace e with r']

//有趣:你可以试试看为什么每一行的长度都一样 :D