考虑 python 中的键/列顺序计算字典和数据框之间的匹配值

Question

我想将一个特定的数据条目（dictonary/dataframe 有 20 个变量）与一个可能相同的数据库条目相匹配。

由于没有唯一标识符，而且某些条目有很多缺失值，我想进行 "naive" 猜测。意味着我想按行计算所有匹配值并取前 10 个潜在客户。

目前，我将字典转换为列表并使用.isin() 来获取匹配值的数量。

db['no_matches'] = db.isin(list_of_criterias).sum(1)
prospects = db.nlargest(10 ['no_matches'])

但是，我的方法具有误导性，因为我计算匹配项时不考虑列 order/name。

意思是，如果我的搜索值是column1 = 'foo'，它也与我数据库中不在 column1 中的 'foo' 值相匹配。

有没有一种方法可以按行计算匹配值并同时考虑列顺序？

谢谢。

更新：

感谢 Quang Hoang 的评论，我将相应的字典传递给了 .isin() 函数。但是，我收到 type error.

In[9]: type(clean_criteria)
Out[9]: dict

db.isin(clean_criteria) #Throws Error

TypeError: only list-like or dict-like objects are allowed to be passed to DataFrame.isin(), you passed a 'str'

Answer 1

Proposed/derived 评论解决方案（针对社区维基）：

dict_criteria = df_criteria.to_dict('list') 

db['no_matches'] = db.isin(dict_criteria).sum(1)  
prospects = db.nlargest(10 ['no_matches'])

说明

.to_dict('list') -- 'list' 参数将 dict 值从 skalar 转换为 list/array object
.isin() -- 传递一个 'list' 匹配任何不考虑顺序的值，而传递一个字典 does

考虑 python 中的键/列顺序计算字典和数据框之间的匹配值

Count matching values between dictionary and dataframe considering keys / column order in python

python

similarity

match

dataframe

pandas