如何使用 spacy 传递 table 或数据框而不是具有实体识别的文本

how can I pass table or dataframe instead of text with entity recognition using spacy

下面展示了如何用spaCy添加多个EntityRuler。执行此操作的代码如下:

import spacy
import pandas as pd

from spacy.pipeline import EntityRuler
nlp = spacy.load('en_core_web_sm', disable = ['ner'])
ruler = nlp.add_pipe("entity_ruler")


flowers = ["rose", "tulip", "african daisy"]
for f in flowers:
    ruler.add_patterns([{"label": "flower", "pattern": f}])
animals = ["cat", "dog", "artic fox"]
for a in animals:
    ruler.add_patterns([{"label": "animal", "pattern": a}])



result={}
doc = nlp("cat and artic fox, plant african daisy")
for ent in doc.ents:
        result[ent.label_]=ent.text
df = pd.DataFrame([result])
print(df)

输出:

      animal         flower
0  artic fox  african daisy

问题是:我如何传递数据框或table而不是文本: “猫和北极狐,种植非洲雏菊”

假设您的数据框是

df = pd.DataFrame({'Text':["cat and artic fox, plant african daisy"]})

您可以定义一个自定义方法来提取实体,然后将其与 Series.apply:

一起使用
def get_entities(x):
    result = {}
    doc = nlp(x)
    for ent in doc.ents:
        result[ent.label_]=ent.text
    return result

然后

df['Matches'] = df['Text'].apply(get_entities)
>>> df['Matches']
0    {'animal': 'artic fox', 'flower': 'african daisy'}
Name: Matches, dtype: object