Spacy 中的命名实体识别

Named entity recognition in Spacy

我正在尝试为以下句子查找命名实体

import spacy.lang.en
parser = spacy.lang.en.English()
ParsedSentence = parser(u"Alphabet is a new startup in China")
for Entity in  ParsedSentence.ents:  
    print (Entity.label, Entity.label_, ' '.join(t.orth_ for t in Entity))

我希望得到结果 "Alphabet","China",但结果是一个空集。我在这里做错了什么

根据名称实体识别的 spacy documentation 这里是提取名称实体的方法

import spacy
nlp = spacy.load('en') # install 'en' model (python3 -m spacy download en)
doc = nlp("Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))

结果
Name Entity: (China,)

要使 "Alphabet" 成为 'Noun',请在其后附加 "The"。

doc = nlp("The Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))

Name Entity: (Alphabet, China)

在 Spacy 版本 3 中,Hugging Face 中的变形金刚针对 Spacy 在以前版本中提供的操作进行了微调,但结果更好。

Transformers are currently (2020) the state-of-art in Natural Language Processing, i.e generally we had (one-hot-encode -> word2vec -> glove | fast text) then (recurrent neural network, recursive neural network, gated recurrent unit, long short-term memory, bi-directional long short-term memory, etc) and now Transformers + Attention (BERT, RoBERTa, XLNet, XLM, CTRL, AlBERT, T5, Bart, GPT, GPT-2, GPT-3) - This is just to give context for 'why' you should consider Transformers, I know that there are lots of stuff that I didn't mention like Fuzz, Knowledge Graph and so on

安装依赖项:

sudo apt install libncurses5
pip install spacy-transformers --pre -f https://download.pytorch.org/whl/torch_stable.html
pip install spacy-nightly # I'm using 3.0.0rc2

下载模型:

python -m spacy download en_core_web_trf # English Transformer pipeline, Roberta base

这里有 list 个可用模型。

然后像往常一样使用它:

import spacy


text = 'Type something here which can be related to something, e.g Stack Over Flow organization'

nlp = spacy.load('en_core_web_trf')

document = nlp(text)

print(document.ents)

参考文献:

了解 Transformers and Attention

阅读关于不同 Trasnformers architectures 的摘要。

了解 Spacy 完成的 Transformers fine-tune