尝试使用 Python 和 Pysimilar 或 DamerauLevenshtein 检查充满字符串的 2 列的相似性

Trying to check similarity of 2 columns full of strings with Python and Pysimilar or DamerauLevenshtein

我试着用总结器来总结文本!问题是我想看看这些文本是否太相似,为此我可以在 google 上阅读我可以使用像 pysimilar 或 fastDamerauLevenstein 这样的包....问题和 它们似乎只适用于 1 个文本 ... 你知道怎么做吗,例如 4 个文本或更多?

from summarizers import Summarizers 
summ = Summarizers() 
data = ["The NN-CS89L offers next-level cooking convenience. Its four distinct cooking methods - steaming, baking, grilling and microwaving ensure your meals are cooked or reheated to perfection. Its multi-function capabilities can be combined to save time without compromising taste, texture or nutritional value. It’s the all-in-one kitchen companion designed for people with a busy lifestyle.", "These slim and stylish bodies are packed with high performance. The attractive compact designs and energy-saving functions help Panasonic Blu-ray products consume as little power as possible. You can experience great movie quality with this ultra-fast booting DMP-BD89 Full HD Blu-ray disc player. After starting the player, the time it takes from launching the menu to playing a disc is much shorter than in conventional models. The BD89 also allows for smart home networking (DLNA) and provides access to video on demand, so that home entertainment is more intuitive, more comfortable, and lots more fun."] 

df = pd.DataFrame(data, columns=['summaries'])
df['abstracts'] = df['summaries'].apply(summ)


compare(df.summaries, df.abstracts) ``` 




I have this : 
TypeError                                 Traceback (most recent call last)
<ipython-input-14-d1d78dc1f358> in <module>
----> 1 compare(df.summaries, df.abstracts)

~\Anaconda3\lib\site-packages\pysimilar\__init__.py in compare(self, string_i, string_j, isfile)
     89 
     90         if not isinstance(string_i, (str, Path)) or not isinstance(string_j, (str, Path)):
---> 91             raise TypeError(
     92                 'Both string i and string j must be of type either string or Path')
     93 

TypeError: Both string i and string j must be of type either string or Path

Thanks in advance !

您需要创建一个函数,该函数接收包含两列值的行,并对它们都调用 compare,然后将其应用于数据框。

def compare_row_wise(row):
    return compare(row['summaries'], row['abstracts'])

df.apply(compare_row_wise, axis=1)