如何在 CoreNLP 中使用共指结果迭代标记属性?
How can I iterate token attributes with coreference results in CoreNLP?
我正在寻找一种从 CoreNLP 中提取和合并注释结果的方法。要指定,
import stanza
import os
from stanza.server import CoreNLPClient
corenlp_dir = '/Users/fatih/stanford-corenlp-4.2.0/'
os.environ['CORENLP_HOME'] = corenlp_dir
client = CoreNLPClient(
annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner', 'coref'],
memory='4G',
endpoint='http://localhost:9001',
be_quiet=True)
text = "Barack Obama was born in Hawaii. He is the president. Obama was elected in 2008."
doc = client.annotate(text)
for x in doc.corefChain:
for y in x.mention:
print(y.animacy)
ANIMATE
ANIMATE
ANIMATE
我想将这些结果与来自以下代码的结果合并:
for i, sent in enumerate(document.sentence):
print("[Sentence {}]".format(i+1))
for t in sent.token:
print("{:12s}\t{:12s}\t{:6s}\t{}".format(t.word, t.lemma, t.pos, t.ner))
print("")
Barack Barack NNP PERSON
Obama Obama NNP PERSON
was be VBD O
born bear VBN O
in in IN O
Hawaii Hawaii NNP STATE_OR_PROVINCE
. . . O
[Sentence 2]
He he PRP O
is be VBZ O
the the DT O
president president NN TITLE
. . . O
[Sentence 3]
Obama Obama NNP PERSON
was be VBD O
elected elect VBN O
in in IN O
2008 2008 CD DATE
. . . O
由于注释存储在不同的对象中,我无法遍历这两个不同的对象并获取相关项的结果。
有出路吗?
谢谢。
coref 链有一个 sentenceIndex 和一个 beginIndex,它们应该与句子中的位置相关联。您可以使用它来关联两者。
编辑:对您的示例代码进行快速而肮脏的更改:
from collections import defaultdict
from stanza.server import CoreNLPClient
client = CoreNLPClient(
annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner', 'coref'],
be_quiet=False)
text = "Barack Obama was born in Hawaii. In 2008 he became the president."
doc = client.annotate(text)
animacy = defaultdict(dict)
for x in doc.corefChain:
for y in x.mention:
print(y.animacy)
for i in range(y.beginIndex, y.endIndex):
animacy[y.sentenceIndex][i] = True
print(y.sentenceIndex, i)
for sent_idx, sent in enumerate(doc.sentence):
print("[Sentence {}]".format(sent_idx+1))
for t_idx, token in enumerate(sent.token):
animate = animacy[sent_idx].get(t_idx, False)
print("{:12s}\t{:12s}\t{:6s}\t{:20s}\t{}".format(token.word, token.lemma, token.pos, token.ner, animate))
print("")
我正在寻找一种从 CoreNLP 中提取和合并注释结果的方法。要指定,
import stanza
import os
from stanza.server import CoreNLPClient
corenlp_dir = '/Users/fatih/stanford-corenlp-4.2.0/'
os.environ['CORENLP_HOME'] = corenlp_dir
client = CoreNLPClient(
annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner', 'coref'],
memory='4G',
endpoint='http://localhost:9001',
be_quiet=True)
text = "Barack Obama was born in Hawaii. He is the president. Obama was elected in 2008."
doc = client.annotate(text)
for x in doc.corefChain:
for y in x.mention:
print(y.animacy)
ANIMATE
ANIMATE
ANIMATE
我想将这些结果与来自以下代码的结果合并:
for i, sent in enumerate(document.sentence):
print("[Sentence {}]".format(i+1))
for t in sent.token:
print("{:12s}\t{:12s}\t{:6s}\t{}".format(t.word, t.lemma, t.pos, t.ner))
print("")
Barack Barack NNP PERSON
Obama Obama NNP PERSON
was be VBD O
born bear VBN O
in in IN O
Hawaii Hawaii NNP STATE_OR_PROVINCE
. . . O
[Sentence 2]
He he PRP O
is be VBZ O
the the DT O
president president NN TITLE
. . . O
[Sentence 3]
Obama Obama NNP PERSON
was be VBD O
elected elect VBN O
in in IN O
2008 2008 CD DATE
. . . O
由于注释存储在不同的对象中,我无法遍历这两个不同的对象并获取相关项的结果。
有出路吗?
谢谢。
coref 链有一个 sentenceIndex 和一个 beginIndex,它们应该与句子中的位置相关联。您可以使用它来关联两者。
编辑:对您的示例代码进行快速而肮脏的更改:
from collections import defaultdict
from stanza.server import CoreNLPClient
client = CoreNLPClient(
annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner', 'coref'],
be_quiet=False)
text = "Barack Obama was born in Hawaii. In 2008 he became the president."
doc = client.annotate(text)
animacy = defaultdict(dict)
for x in doc.corefChain:
for y in x.mention:
print(y.animacy)
for i in range(y.beginIndex, y.endIndex):
animacy[y.sentenceIndex][i] = True
print(y.sentenceIndex, i)
for sent_idx, sent in enumerate(doc.sentence):
print("[Sentence {}]".format(sent_idx+1))
for t_idx, token in enumerate(sent.token):
animate = animacy[sent_idx].get(t_idx, False)
print("{:12s}\t{:12s}\t{:6s}\t{:20s}\t{}".format(token.word, token.lemma, token.pos, token.ner, animate))
print("")