如何使用 spacy 传递 table 或数据框而不是具有实体识别的文本
how can I pass table or dataframe instead of text with entity recognition using spacy
下面展示了如何用spaCy添加多个EntityRuler。执行此操作的代码如下:
import spacy
import pandas as pd
from spacy.pipeline import EntityRuler
nlp = spacy.load('en_core_web_sm', disable = ['ner'])
ruler = nlp.add_pipe("entity_ruler")
flowers = ["rose", "tulip", "african daisy"]
for f in flowers:
ruler.add_patterns([{"label": "flower", "pattern": f}])
animals = ["cat", "dog", "artic fox"]
for a in animals:
ruler.add_patterns([{"label": "animal", "pattern": a}])
result={}
doc = nlp("cat and artic fox, plant african daisy")
for ent in doc.ents:
result[ent.label_]=ent.text
df = pd.DataFrame([result])
print(df)
输出:
animal flower
0 artic fox african daisy
问题是:我如何传递数据框或table而不是文本: “猫和北极狐,种植非洲雏菊”
假设您的数据框是
df = pd.DataFrame({'Text':["cat and artic fox, plant african daisy"]})
您可以定义一个自定义方法来提取实体,然后将其与 Series.apply
:
一起使用
def get_entities(x):
result = {}
doc = nlp(x)
for ent in doc.ents:
result[ent.label_]=ent.text
return result
然后
df['Matches'] = df['Text'].apply(get_entities)
>>> df['Matches']
0 {'animal': 'artic fox', 'flower': 'african daisy'}
Name: Matches, dtype: object
下面
import spacy
import pandas as pd
from spacy.pipeline import EntityRuler
nlp = spacy.load('en_core_web_sm', disable = ['ner'])
ruler = nlp.add_pipe("entity_ruler")
flowers = ["rose", "tulip", "african daisy"]
for f in flowers:
ruler.add_patterns([{"label": "flower", "pattern": f}])
animals = ["cat", "dog", "artic fox"]
for a in animals:
ruler.add_patterns([{"label": "animal", "pattern": a}])
result={}
doc = nlp("cat and artic fox, plant african daisy")
for ent in doc.ents:
result[ent.label_]=ent.text
df = pd.DataFrame([result])
print(df)
输出:
animal flower
0 artic fox african daisy
问题是:我如何传递数据框或table而不是文本: “猫和北极狐,种植非洲雏菊”
假设您的数据框是
df = pd.DataFrame({'Text':["cat and artic fox, plant african daisy"]})
您可以定义一个自定义方法来提取实体,然后将其与 Series.apply
:
def get_entities(x):
result = {}
doc = nlp(x)
for ent in doc.ents:
result[ent.label_]=ent.text
return result
然后
df['Matches'] = df['Text'].apply(get_entities)
>>> df['Matches']
0 {'animal': 'artic fox', 'flower': 'african daisy'}
Name: Matches, dtype: object