Spacy 中的命名实体识别
Named entity recognition in Spacy
我正在尝试为以下句子查找命名实体
import spacy.lang.en
parser = spacy.lang.en.English()
ParsedSentence = parser(u"Alphabet is a new startup in China")
for Entity in ParsedSentence.ents:
print (Entity.label, Entity.label_, ' '.join(t.orth_ for t in Entity))
我希望得到结果 "Alphabet","China",但结果是一个空集。我在这里做错了什么
根据名称实体识别的 spacy documentation 这里是提取名称实体的方法
import spacy
nlp = spacy.load('en') # install 'en' model (python3 -m spacy download en)
doc = nlp("Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))
结果
Name Entity: (China,)
要使 "Alphabet" 成为 'Noun',请在其后附加 "The"。
doc = nlp("The Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))
Name Entity: (Alphabet, China)
在 Spacy 版本 3 中,Hugging Face 中的变形金刚针对 Spacy 在以前版本中提供的操作进行了微调,但结果更好。
Transformers are currently (2020) the state-of-art in Natural Language Processing, i.e generally we had (one-hot-encode -> word2vec -> glove | fast text) then (recurrent neural network, recursive neural network, gated recurrent unit, long short-term memory, bi-directional long short-term memory, etc) and now Transformers + Attention (BERT, RoBERTa, XLNet, XLM, CTRL, AlBERT, T5, Bart, GPT, GPT-2, GPT-3) - This is just to give context for 'why' you should consider Transformers, I know that there are lots of stuff that I didn't mention like Fuzz, Knowledge Graph and so on
安装依赖项:
sudo apt install libncurses5
pip install spacy-transformers --pre -f https://download.pytorch.org/whl/torch_stable.html
pip install spacy-nightly # I'm using 3.0.0rc2
下载模型:
python -m spacy download en_core_web_trf # English Transformer pipeline, Roberta base
这里有 list 个可用模型。
然后像往常一样使用它:
import spacy
text = 'Type something here which can be related to something, e.g Stack Over Flow organization'
nlp = spacy.load('en_core_web_trf')
document = nlp(text)
print(document.ents)
参考文献:
了解 Transformers and Attention。
阅读关于不同 Trasnformers architectures 的摘要。
了解 Spacy 完成的 Transformers fine-tune。
我正在尝试为以下句子查找命名实体
import spacy.lang.en
parser = spacy.lang.en.English()
ParsedSentence = parser(u"Alphabet is a new startup in China")
for Entity in ParsedSentence.ents:
print (Entity.label, Entity.label_, ' '.join(t.orth_ for t in Entity))
我希望得到结果 "Alphabet","China",但结果是一个空集。我在这里做错了什么
根据名称实体识别的 spacy documentation 这里是提取名称实体的方法
import spacy
nlp = spacy.load('en') # install 'en' model (python3 -m spacy download en)
doc = nlp("Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))
结果
Name Entity: (China,)
要使 "Alphabet" 成为 'Noun',请在其后附加 "The"。
doc = nlp("The Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))
Name Entity: (Alphabet, China)
在 Spacy 版本 3 中,Hugging Face 中的变形金刚针对 Spacy 在以前版本中提供的操作进行了微调,但结果更好。
Transformers are currently (2020) the state-of-art in Natural Language Processing, i.e generally we had (one-hot-encode -> word2vec -> glove | fast text) then (recurrent neural network, recursive neural network, gated recurrent unit, long short-term memory, bi-directional long short-term memory, etc) and now Transformers + Attention (BERT, RoBERTa, XLNet, XLM, CTRL, AlBERT, T5, Bart, GPT, GPT-2, GPT-3) - This is just to give context for 'why' you should consider Transformers, I know that there are lots of stuff that I didn't mention like Fuzz, Knowledge Graph and so on
安装依赖项:
sudo apt install libncurses5
pip install spacy-transformers --pre -f https://download.pytorch.org/whl/torch_stable.html
pip install spacy-nightly # I'm using 3.0.0rc2
下载模型:
python -m spacy download en_core_web_trf # English Transformer pipeline, Roberta base
这里有 list 个可用模型。
然后像往常一样使用它:
import spacy
text = 'Type something here which can be related to something, e.g Stack Over Flow organization'
nlp = spacy.load('en_core_web_trf')
document = nlp(text)
print(document.ents)
参考文献:
了解 Transformers and Attention。
阅读关于不同 Trasnformers architectures 的摘要。
了解 Spacy 完成的 Transformers fine-tune。