按单词相似度查找 csv 行
Find csv lines by word similarity
我有一个包含数千行的 csv 文件。我只想检索与特定单词有一些相似之处的行。在这种情况下,我希望赶上第 1、2 和 4 行。
知道如何实现吗?
import csv
a='Microsoft'
f = open("testing.csv")
reader = csv.reader(f, delimiter='\n')
for row in reader:
if a in row[0]:
print row[0]
testing.csv
I like very much the Microsoft products
Me too, I like Micrsoft
I prefer Apple products
microfte here
fuzzywuzzy
库适用于此。鉴于您的测试数据和预期结果,我假设大小写无关紧要,所以我将要比较的单词和测试数据都大写:
from fuzzywuzzy import fuzz
import csv
word = 'Microsoft'.upper()
f = open('testing.csv')
reader = csv.reader(f, delimiter='\n')
for row in reader:
a = row[0].split(' ')
if max([fuzz.ratio(word, x.upper()) for x in a]) > 80:
print(row[0])
结果:
$ python test.py
I like very much the Microsoft products
Me too, I like Micrsoft
microfte here
我有一个包含数千行的 csv 文件。我只想检索与特定单词有一些相似之处的行。在这种情况下,我希望赶上第 1、2 和 4 行。
知道如何实现吗?
import csv
a='Microsoft'
f = open("testing.csv")
reader = csv.reader(f, delimiter='\n')
for row in reader:
if a in row[0]:
print row[0]
testing.csv
I like very much the Microsoft products
Me too, I like Micrsoft
I prefer Apple products
microfte here
fuzzywuzzy
库适用于此。鉴于您的测试数据和预期结果,我假设大小写无关紧要,所以我将要比较的单词和测试数据都大写:
from fuzzywuzzy import fuzz
import csv
word = 'Microsoft'.upper()
f = open('testing.csv')
reader = csv.reader(f, delimiter='\n')
for row in reader:
a = row[0].split(' ')
if max([fuzz.ratio(word, x.upper()) for x in a]) > 80:
print(row[0])
结果:
$ python test.py I like very much the Microsoft products Me too, I like Micrsoft microfte here