使用查找比较数据并使用 python 仅输出数据中最长的短语?
Comparing data using lookup and output only longest phrase in the data using python?
我有一个 csv,其中包含 "KKR" 到 "MBI" 数据的映射。我想从用户给定的数据中执行查找,以从 KKR 中提取最长的匹配短语(如果包含长短语的单词,则忽略小短语)
#os.chdir("kkr_lookup")
data = pd.read_csv("KKR_MBI_MAP.csv")
dfData = pd.DataFrame(data)
dfVerbatim = pd.DataFrame()
dataVerbatim = {'verbatim': ['She experienced skin allergy and hair loss after using it for 2-3 weeks']}
dfVerbatim = pd.DataFrame(dataVerbatim, columns = ['verbatim'])
for index, frame in dfData.iterrows():
if pd.notnull(frame['KKR']) & dfVerbatim['verbatim'].str.contains(frame['KKR'], case=False).bool() :
k=(frame['MBI']).lower()
l=(frame['KKR']).lower()
print("MBI:",l)
#print("MBI:",k)
代码给出的输出为:
allergy
hair loss
skin allergy
但我需要:
skin allergy
hair loss
这里我已经编码以从用户输入数据中提取术语。但它提取了 "allergy" 和 "skin allergy" 而我在这里只需要 "skin allergy" 。
请帮助我...
import re
list_of_strings=["skin allergy","hair loss","allergy","hair", "skin"]
sentence="She experienced skin allergy and hair loss after using it for 2-3 weeks"
pattern = re.compile(r"(\b" + "|".join(list_of_strings) + r")\b")
m = pattern.findall(sentence)
print(m)
我有一个 csv,其中包含 "KKR" 到 "MBI" 数据的映射。我想从用户给定的数据中执行查找,以从 KKR 中提取最长的匹配短语(如果包含长短语的单词,则忽略小短语)
#os.chdir("kkr_lookup")
data = pd.read_csv("KKR_MBI_MAP.csv")
dfData = pd.DataFrame(data)
dfVerbatim = pd.DataFrame()
dataVerbatim = {'verbatim': ['She experienced skin allergy and hair loss after using it for 2-3 weeks']}
dfVerbatim = pd.DataFrame(dataVerbatim, columns = ['verbatim'])
for index, frame in dfData.iterrows():
if pd.notnull(frame['KKR']) & dfVerbatim['verbatim'].str.contains(frame['KKR'], case=False).bool() :
k=(frame['MBI']).lower()
l=(frame['KKR']).lower()
print("MBI:",l)
#print("MBI:",k)
代码给出的输出为:
allergy
hair loss
skin allergy
但我需要:
skin allergy
hair loss
这里我已经编码以从用户输入数据中提取术语。但它提取了 "allergy" 和 "skin allergy" 而我在这里只需要 "skin allergy" 。 请帮助我...
import re
list_of_strings=["skin allergy","hair loss","allergy","hair", "skin"]
sentence="She experienced skin allergy and hair loss after using it for 2-3 weeks"
pattern = re.compile(r"(\b" + "|".join(list_of_strings) + r")\b")
m = pattern.findall(sentence)
print(m)