使用字典在数据框的文本列中搜索关键字

Keywords search in text column of data frame using dictionary

我是 python 的新手,他们的要求非常具体,由于知识有限,我被困住了,如果有人可以帮助解决这个问题,我将不胜感激

我使用 excel 生成了一个字典,看起来像这样

dict = {'Fruit' : {'Comb Words' : ['yellow',
                                   'elongated',
                                   'cooking'],
                   'Mandatory Word' : ['banana',
                                       'banana',
                                       'banana']},
       'Animal' : {'Comb Words' : ['mammal',
                                   'white'
                                   'domestic'],
                  'Mandatory Word' : ['cat',
                                      'cat',
                                      'cat']}}

现在,我有一个包含文本列的数据框,我想将字典中的关键字与该列匹配。例如:

            Text                     Mandatory      Comb            Final
A white domestic cat is playing        cat       domestic,white     Animal
yellow banana is not available        banana       yellow           Fruit

这本词典只是一个想法,我可以更改它,因为它是来自 excel 的输入。因此,任何其他可以导致上述输出的格式或方式都是这里的唯一目标。

使用用户自定义函数:

import pandas as pd

Dict = {'Fruit' : {'Comb Words' : ['yellow',
                                   'elongated',
                                   'cooking'],
                   'Mandatory Word' : ['banana',
                                       'banana',
                                       'banana']},
       'Animal' : {'Comb Words' : ['mammal',
                                   'white',
                                   'domestic'],
                  'Mandatory Word' : ['cat',
                                      'cat',
                                      'cat']}}
                                      
df = pd.DataFrame({'Text':['A white domestic cat is playing',
                            'yellow banana is not available']})

def findMCF(sentence):
    for mand in sentence.split():
        for final in Dict:
            wordtypeDict = Dict[final]
            mandList = wordtypeDict['Mandatory Word']
            if mand in mandList:
                C = [wrd for wrd in sentence.split() if word in wordtypeDict['Comb Words']]
                return (mand,','.join(C),final)

df['Mandatory'],df['Comb'],df['Final'] = zip(*df['Text'].map(findMCF))

print(df)

输出:

                              Text Mandatory            Comb   Final
0  A white domestic cat is playing       cat  white,domestic  Animal
1   yellow banana is not available    banana          yellow   Fruit