如何为 python 中我的列中的行计算 Levenshtein ratio/distance?

How to calculate Levenshtein ratio/distance for rows in my column in python?

我有一个只有一列的数据框,该列有 1000 行。 我需要比较所有行并找到所有行的 Levenshtein 距离。我如何计算 python

中的比率或距离

我有一个数据框如下:

  #Df 
  StepDescription
  click confirm button when done
  you have logged on
  please log in to proceed
  click on confirm button
  Dolb was released successfully
  Enter your details
  validate the statement
  Aval was released sucessfully

如何计算所有这些的编辑比例

我已经编写了代码来遍历循环,但是在迭代之后如何继续。

  import Levenshtein
  import pandas as pd
  data_dist = pd.read_csv('path\Data_TestDescription.csv')
  df = pd.DataFrame(data_dist)
  for index, row in df.iterrows():

如评论中所问,百分比是需要的,我将保留已接受的答案并仅添加新部分:

import numpy as np
import pandas as pd
from Levenshtein import distance
from itertools import product

#df = ...

dist = [distance(*x) for x in product(df.StepDescription, repeat=2)]

dist_df = pd.DataFrame(np.array(dist).reshape(df.shape[0], df.shape[0]))
dist_df

    0   1   2   3   4   5   6   7
0   0  23  23  13  29  25  25  28
1  23   0  18  18  23  18  18  23
2  23  18   0  20  25  21  19  24
3  13  18  20   0  27  19  21  26
4  29  23  25  27   0  26  23   5
5  25  18  21  19  26   0  19  25
6  25  18  19  21  23  19   0  21
7  28  23  24  26   5  25  21   0

dist_df_percentage = dist_df // min(x for x in dist if x > 0) * 100

     0    1    2    3    4    5    6    7
0    0  460  460  260  580  500  500  560
1  460    0  360  360  460  360  360  460
2  460  360    0  400  500  420  380  480
3  260  360  400    0  540  380  420  520
4  580  460  500  540    0  520  460  100
5  500  360  420  380  520    0  380  500
6  500  360  380  420  460  380    0  420
7  560  460  480  520  100  500  420    0

最后,在尝试了很多示例之后,我使用 fuzzratio 得到了准确的比率或百分比

from itertools import product
import numpy as np
import difflib
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import editdistance
dist = np.empty(df.shape[0]**2, dtype=int) 
for i, x in enumerate(product(df.Stepdescription, repeat=2)): 
    dist[i] = fuzz.ratio(*x)
dist_df = pd.DataFrame(dist.reshape(-1, df.shape[0]))
out_csv= dist_df.to_csv('FuzzyRatio.csv', sep='\t')