如何在 python 中使用 vlookup 在数据框中查找文本?
How to use vlook up in python to find text in a dataframe?
我想在 python 中使用类似 vlook-up/map 的函数。
我只有一些公司全名的一部分。我想知道公司是否进入数据集,如下例。
谢谢
df1['in DATASET'] = df1['NAME'].isin(df2['FULL DATASET'])
我可以重新创建结果,将一个列表与另一个列表进行比较。您的匹配标准不是很清楚或合乎逻辑。 “john usa”是与“aviation john”的成功匹配,因为“john”出现在两者的基础上。但是“john usa”是否会与“usa mark sas”构成匹配,因为“usa”出现在两者中?连字符、逗号等呢?
如果能解决这个问题会有所帮助。
无论如何,希望以下内容对您有所帮助,祝您好运:-
#create two lists of tuples based on the existing dataframes.
check_list = list(df_check.to_records(index=False))
full_list = list(df_full.to_records(index=False))
#create a set - entries in a set are unique
results=set()
for check in check_list: #for each record to check...
for search_word in check[0].split(" "): #take the first column and split it into its words using space as a delimiter
found=any(search_word in rec[0] for rec in full_list) #is the word a substring of any of the records in full list? True or False
results.add((check[0], found)) #add the record we checked to the set with the result (the set avoids duplicate entries)
#build a dataframe based on the results
df_results=df(results, columns=["check", "found"])
我想在 python 中使用类似 vlook-up/map 的函数。
我只有一些公司全名的一部分。我想知道公司是否进入数据集,如下例。
谢谢
df1['in DATASET'] = df1['NAME'].isin(df2['FULL DATASET'])
我可以重新创建结果,将一个列表与另一个列表进行比较。您的匹配标准不是很清楚或合乎逻辑。 “john usa”是与“aviation john”的成功匹配,因为“john”出现在两者的基础上。但是“john usa”是否会与“usa mark sas”构成匹配,因为“usa”出现在两者中?连字符、逗号等呢? 如果能解决这个问题会有所帮助。
无论如何,希望以下内容对您有所帮助,祝您好运:-
#create two lists of tuples based on the existing dataframes.
check_list = list(df_check.to_records(index=False))
full_list = list(df_full.to_records(index=False))
#create a set - entries in a set are unique
results=set()
for check in check_list: #for each record to check...
for search_word in check[0].split(" "): #take the first column and split it into its words using space as a delimiter
found=any(search_word in rec[0] for rec in full_list) #is the word a substring of any of the records in full list? True or False
results.add((check[0], found)) #add the record we checked to the set with the result (the set avoids duplicate entries)
#build a dataframe based on the results
df_results=df(results, columns=["check", "found"])