python——模糊匹配,循环遍历一个数据集,在参考集中寻找对应的项
python - fuzzy matching, looping through a data set to find corresponding items in the reference set
我正在尝试学习和实现 python 中的模糊匹配。我有两个数据集,我将它们作为数据帧加载到 pandas 中。集合 1 是参考集。集合二是包含与参考名称匹配的数据的集合。
我循环遍历 set_1 项以搜索参考中的相应条目,但出现错误。我需要一些帮助来解决这个错误。
我是否试图以一种好的方式构建算法?
我的尝试:
import pandas as pd
import fuzzywuzzy as fuzzy
from difflib import SequenceMatcher
set_1 = pd.read_csv("C:/Folder/file_1.csv")
set_2 = pd.read_csv("C:/Folder/file_2.csv")
query = set_1['name']
choices = set_2['name2']
for query in query:
match = fuzzy.extractOne(query,choises=choises,scorer=scorer,score_cutoff=cutoff)
我收到以下错误:
AttributeError: module 'fuzzywuzzy' has no attribute 'extractOne'
如果您在 github 上查看包的 usage,您会注意到 extractOne
是 fuzzywuzzy.process
中定义的函数,因此您将需要像这样导入子模块:
import pandas as pd
from fuzzywuzzy import process # <-- note the difference
from difflib import SequenceMatcher
set_1 = pd.read_csv("C:/Folder/file_1.csv")
set_2 = pd.read_csv("C:/Folder/file_2.csv")
query = set_1['name']
choices = set_2['name2']
for query in query:
# vvvvvvv note the difference
match = process.extractOne(query,choises=choises,scorer=scorer,score_cutoff=cutoff)
我正在尝试学习和实现 python 中的模糊匹配。我有两个数据集,我将它们作为数据帧加载到 pandas 中。集合 1 是参考集。集合二是包含与参考名称匹配的数据的集合。
我循环遍历 set_1 项以搜索参考中的相应条目,但出现错误。我需要一些帮助来解决这个错误。
我是否试图以一种好的方式构建算法?
我的尝试:
import pandas as pd
import fuzzywuzzy as fuzzy
from difflib import SequenceMatcher
set_1 = pd.read_csv("C:/Folder/file_1.csv")
set_2 = pd.read_csv("C:/Folder/file_2.csv")
query = set_1['name']
choices = set_2['name2']
for query in query:
match = fuzzy.extractOne(query,choises=choises,scorer=scorer,score_cutoff=cutoff)
我收到以下错误:
AttributeError: module 'fuzzywuzzy' has no attribute 'extractOne'
如果您在 github 上查看包的 usage,您会注意到 extractOne
是 fuzzywuzzy.process
中定义的函数,因此您将需要像这样导入子模块:
import pandas as pd
from fuzzywuzzy import process # <-- note the difference
from difflib import SequenceMatcher
set_1 = pd.read_csv("C:/Folder/file_1.csv")
set_2 = pd.read_csv("C:/Folder/file_2.csv")
query = set_1['name']
choices = set_2['name2']
for query in query:
# vvvvvvv note the difference
match = process.extractOne(query,choises=choises,scorer=scorer,score_cutoff=cutoff)