使用字典在数据框的文本列中搜索关键字
Keywords search in text column of data frame using dictionary
我是 python 的新手,他们的要求非常具体,由于知识有限,我被困住了,如果有人可以帮助解决这个问题,我将不胜感激
我使用 excel 生成了一个字典,看起来像这样
dict = {'Fruit' : {'Comb Words' : ['yellow',
'elongated',
'cooking'],
'Mandatory Word' : ['banana',
'banana',
'banana']},
'Animal' : {'Comb Words' : ['mammal',
'white'
'domestic'],
'Mandatory Word' : ['cat',
'cat',
'cat']}}
现在,我有一个包含文本列的数据框,我想将字典中的关键字与该列匹配。例如:
Text Mandatory Comb Final
A white domestic cat is playing cat domestic,white Animal
yellow banana is not available banana yellow Fruit
这本词典只是一个想法,我可以更改它,因为它是来自 excel 的输入。因此,任何其他可以导致上述输出的格式或方式都是这里的唯一目标。
使用用户自定义函数:
import pandas as pd
Dict = {'Fruit' : {'Comb Words' : ['yellow',
'elongated',
'cooking'],
'Mandatory Word' : ['banana',
'banana',
'banana']},
'Animal' : {'Comb Words' : ['mammal',
'white',
'domestic'],
'Mandatory Word' : ['cat',
'cat',
'cat']}}
df = pd.DataFrame({'Text':['A white domestic cat is playing',
'yellow banana is not available']})
def findMCF(sentence):
for mand in sentence.split():
for final in Dict:
wordtypeDict = Dict[final]
mandList = wordtypeDict['Mandatory Word']
if mand in mandList:
C = [wrd for wrd in sentence.split() if word in wordtypeDict['Comb Words']]
return (mand,','.join(C),final)
df['Mandatory'],df['Comb'],df['Final'] = zip(*df['Text'].map(findMCF))
print(df)
输出:
Text Mandatory Comb Final
0 A white domestic cat is playing cat white,domestic Animal
1 yellow banana is not available banana yellow Fruit
我是 python 的新手,他们的要求非常具体,由于知识有限,我被困住了,如果有人可以帮助解决这个问题,我将不胜感激
我使用 excel 生成了一个字典,看起来像这样
dict = {'Fruit' : {'Comb Words' : ['yellow',
'elongated',
'cooking'],
'Mandatory Word' : ['banana',
'banana',
'banana']},
'Animal' : {'Comb Words' : ['mammal',
'white'
'domestic'],
'Mandatory Word' : ['cat',
'cat',
'cat']}}
现在,我有一个包含文本列的数据框,我想将字典中的关键字与该列匹配。例如:
Text Mandatory Comb Final
A white domestic cat is playing cat domestic,white Animal
yellow banana is not available banana yellow Fruit
这本词典只是一个想法,我可以更改它,因为它是来自 excel 的输入。因此,任何其他可以导致上述输出的格式或方式都是这里的唯一目标。
使用用户自定义函数:
import pandas as pd
Dict = {'Fruit' : {'Comb Words' : ['yellow',
'elongated',
'cooking'],
'Mandatory Word' : ['banana',
'banana',
'banana']},
'Animal' : {'Comb Words' : ['mammal',
'white',
'domestic'],
'Mandatory Word' : ['cat',
'cat',
'cat']}}
df = pd.DataFrame({'Text':['A white domestic cat is playing',
'yellow banana is not available']})
def findMCF(sentence):
for mand in sentence.split():
for final in Dict:
wordtypeDict = Dict[final]
mandList = wordtypeDict['Mandatory Word']
if mand in mandList:
C = [wrd for wrd in sentence.split() if word in wordtypeDict['Comb Words']]
return (mand,','.join(C),final)
df['Mandatory'],df['Comb'],df['Final'] = zip(*df['Text'].map(findMCF))
print(df)
输出:
Text Mandatory Comb Final
0 A white domestic cat is playing cat white,domestic Animal
1 yellow banana is not available banana yellow Fruit