命名实体识别无法显示列表
Named Entity Recognition failing to show lists
我的想法是,我在标记化的文本上使用命名实体识别 (NER),该文本也被标记了。
def make_tag_lists(sents):
tokens=[]
pos=[]
ner=[]
for sent in sents:
for t in sent:
tokens.append(t.text)
pos.append(t.pos_)
ner.append(t.ent_type_)
return tokens,pos,ner
tokens,pos,ner = make_tag_lists(sample)
def extract_entities(tokenlist,taglist,tagtype):
entities={}
inentity=False
for i,(token,tag) in enumerate(zip(tokenlist,taglist)):
if tag==tagtype:
if inentity:
entity+=" "+token
else:
entity=token
inentity=True
elif inentity:
entities[entity]=entities.get(entity,0)+1
inentity=False
return entities
people=extract_entities(tokens,ner,"PERSON")
top_people=sorted(people.items(),key=operator.itemgetter(1),reverse=True)[:20]
print(top_people)
我应该收到的是列表中最常提到的前 20 名人员,尽管我的输出当前是一个空列表。没有语法错误,但不确定哪里出错了。
我建议您尝试跳过代码的第一块并检查剩余的执行流程。
# tokens,pos,ner = make_tag_lists(sample)
tokens = ['Hi','FOO','BAR',"it's",'ME']
ner =['MISC','PERSON','PERSON','MISC','PERSON']
def extract_entities(tokenlist,taglist,tagtype):
entities={}
inentity=False
for i,(token,tag) in enumerate(zip(tokenlist,taglist)):
if tag==tagtype:
if inentity:
entity+=" "+token
else:
entity=token
inentity=True
elif inentity:
entities[entity]=entities.get(entity,0)+1
inentity=False
return entities
people=extract_entities(tokens,ner,"PERSON")
top_people=sorted(people.items(),key=operator.itemgetter(1),reverse=True)[:20]
print(top_people)
这个例子的结果是 [('FOO BAR', 1)]
。
此外,请注意您缺少最后一个 PERSON 实体,因为它没有添加到 entities
字典中。
我的想法是,我在标记化的文本上使用命名实体识别 (NER),该文本也被标记了。
def make_tag_lists(sents):
tokens=[]
pos=[]
ner=[]
for sent in sents:
for t in sent:
tokens.append(t.text)
pos.append(t.pos_)
ner.append(t.ent_type_)
return tokens,pos,ner
tokens,pos,ner = make_tag_lists(sample)
def extract_entities(tokenlist,taglist,tagtype):
entities={}
inentity=False
for i,(token,tag) in enumerate(zip(tokenlist,taglist)):
if tag==tagtype:
if inentity:
entity+=" "+token
else:
entity=token
inentity=True
elif inentity:
entities[entity]=entities.get(entity,0)+1
inentity=False
return entities
people=extract_entities(tokens,ner,"PERSON")
top_people=sorted(people.items(),key=operator.itemgetter(1),reverse=True)[:20]
print(top_people)
我应该收到的是列表中最常提到的前 20 名人员,尽管我的输出当前是一个空列表。没有语法错误,但不确定哪里出错了。
我建议您尝试跳过代码的第一块并检查剩余的执行流程。
# tokens,pos,ner = make_tag_lists(sample)
tokens = ['Hi','FOO','BAR',"it's",'ME']
ner =['MISC','PERSON','PERSON','MISC','PERSON']
def extract_entities(tokenlist,taglist,tagtype):
entities={}
inentity=False
for i,(token,tag) in enumerate(zip(tokenlist,taglist)):
if tag==tagtype:
if inentity:
entity+=" "+token
else:
entity=token
inentity=True
elif inentity:
entities[entity]=entities.get(entity,0)+1
inentity=False
return entities
people=extract_entities(tokens,ner,"PERSON")
top_people=sorted(people.items(),key=operator.itemgetter(1),reverse=True)[:20]
print(top_people)
这个例子的结果是 [('FOO BAR', 1)]
。
此外,请注意您缺少最后一个 PERSON 实体,因为它没有添加到 entities
字典中。