nlp:这个依赖标签是否正确?在这种情况下它到底意味着什么?
nlp: is this dependence tag correct? What does exactly it mean in this situation?
我正在探索神奇的 python 库,我得到了这个:
text='The Titanic managed to sail into the coast intact, and Conan went to Chicago.'
token_pos=[token.pos_ 对于 spacy_doc] 中的令牌
token_tag=[token.tag_ 对于 spacy_doc] 中的令牌
token_dep=[token.dep_ 对于 spacy_doc]
中的令牌
token_pos
['DET', 'PROPN', 'VERB', 'PART', 'VERB', 'ADP', 'DET', 'NOUN', 'SPACE', 'ADJ', 'PUNCT', 'CCONJ', 'PROPN', 'VERB', 'ADP', 'PROPN', 'PUNCT']
token_tag
['DT', 'NNP', 'VBD', 'TO', 'VB', 'IN', 'DT', 'NN', '_SP', 'JJ', ',', 'CC', 'NNP', 'VBD', 'IN', 'NNP', '.']
token_dep
['det', 'nsubj', 'ROOT', 'aux', 'xcomp', 'prep', 'det', 'pobj', '', 'advcl', 'punct', 'cc', 'nsubj', 'conj', 'prep', 'pobj', 'punct']
树
def to_nltk_tree(node):
if node.n_lefts + node.n_rights > 0:
return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
else:
return node.orth_
[to_nltk_tree(sent.root).pretty_print() for sent in spacy_doc.sents]
managed
_____________________|_________________________
| | | sail |
| | | _____|__________ |
| | | | | into went
| | | | | | ____|______
| | Titanic | | coast | | to
| | | | | ____|____ | | |
, and The to intact the Conan . Chicago
问题:我对“managed”和“went”之间的依赖关系感到困惑。这是一个“conj”。 (1) 这是分类错误吗?如果是分类错误,那么正确的分类是什么?如果不是,您能解释为什么会这样吗? Spacy 将其解释为“连词”:(2) 有没有办法将这种情况与下面的情况区分开来?
spacy.explain('conj')
Out[59]: 'conjunct'
连词是由并列连词连接的两个元素之间的关系,例如“和”、“或”等:
“比尔又大又诚实”
“他们要么滑雪要么滑雪板”
conj(大,诚实)
conj(滑雪板、滑雪板)
现在看看这最后一句话:
text='They either ski or snowboard.'
spacy_doc = nlp(text)
token_pos=[token.pos_ for token in spacy_doc]
token_tag=[token.tag_ for token in spacy_doc]
token_dep=[token.dep_ for token in spacy_doc]
print(token_pos)
['PRON', 'CCONJ', 'VERB', 'CCONJ', 'NOUN', 'PUNCT']
print(token_tag)
['PRP', 'CC', 'VBP', 'CC', 'NN', '.']
print(token_dep)
['ROOT', 'preconj', 'appos', 'cc', 'conj', 'punct']
[to_nltk_tree(sent.root).pretty_print() for sent in spacy_doc.sents]
They
__________|____
| ski
| __________|______
. either or snowboard
“ski”和“snowboard”之间的关系依赖也是“conj”,在这种情况下它似乎是正确的分类。
我认为答案就在你的问题本身。 "managed" 和 "went" 是由并列连词连接的两个元素,这也是我们在 spacy 的输出中看到的:
text = 'The Titanic managed to sail into the coast intact, and Conan went to Chicago.'
spacy_doc = nlp(text)
[(token.text, token.dep_) for token in spacy_doc]
输出:
[('The', 'det'),
('Titanic', 'nsubj'),
('managed', 'ROOT'),
('to', 'aux'),
('sail', 'xcomp'),
('into', 'prep'),
('the', 'det'),
('coast', 'pobj'),
(' ', ''),
('intact', 'advmod'),
(',', 'punct'),
('and', 'cc'),
('Conan', 'nsubj'),
('went', 'conj'),
('to', 'prep'),
('Chicago', 'pobj'),
('.', 'punct')]
是的,我相信这是正确的。
text='The Titanic managed to sail into the coast intact, and Conan
went to Chicago.'
在这个例子中,“managed”和“went”这两个词连接在“and”这个词上,这是并列连词。
这与您在斯坦福依赖手册中提供的定义完全一致:
A conjunct is the relation between two elements connected by a
coordinating conjunction, such as “and”, “or”, etc:
“Bill is big and honest”
“They either ski or snowboard”
我正在探索神奇的 python 库,我得到了这个:
text='The Titanic managed to sail into the coast intact, and Conan went to Chicago.'
token_pos=[token.pos_ 对于 spacy_doc] 中的令牌 token_tag=[token.tag_ 对于 spacy_doc] 中的令牌 token_dep=[token.dep_ 对于 spacy_doc]
中的令牌token_pos
['DET', 'PROPN', 'VERB', 'PART', 'VERB', 'ADP', 'DET', 'NOUN', 'SPACE', 'ADJ', 'PUNCT', 'CCONJ', 'PROPN', 'VERB', 'ADP', 'PROPN', 'PUNCT']
token_tag
['DT', 'NNP', 'VBD', 'TO', 'VB', 'IN', 'DT', 'NN', '_SP', 'JJ', ',', 'CC', 'NNP', 'VBD', 'IN', 'NNP', '.']
token_dep
['det', 'nsubj', 'ROOT', 'aux', 'xcomp', 'prep', 'det', 'pobj', '', 'advcl', 'punct', 'cc', 'nsubj', 'conj', 'prep', 'pobj', 'punct']
树
def to_nltk_tree(node):
if node.n_lefts + node.n_rights > 0:
return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
else:
return node.orth_
[to_nltk_tree(sent.root).pretty_print() for sent in spacy_doc.sents]
managed
_____________________|_________________________
| | | sail |
| | | _____|__________ |
| | | | | into went
| | | | | | ____|______
| | Titanic | | coast | | to
| | | | | ____|____ | | |
, and The to intact the Conan . Chicago
问题:我对“managed”和“went”之间的依赖关系感到困惑。这是一个“conj”。 (1) 这是分类错误吗?如果是分类错误,那么正确的分类是什么?如果不是,您能解释为什么会这样吗? Spacy 将其解释为“连词”:(2) 有没有办法将这种情况与下面的情况区分开来?
spacy.explain('conj')
Out[59]: 'conjunct'
连词是由并列连词连接的两个元素之间的关系,例如“和”、“或”等:
“比尔又大又诚实”
“他们要么滑雪要么滑雪板”
conj(大,诚实)
conj(滑雪板、滑雪板)
现在看看这最后一句话:
text='They either ski or snowboard.'
spacy_doc = nlp(text)
token_pos=[token.pos_ for token in spacy_doc]
token_tag=[token.tag_ for token in spacy_doc]
token_dep=[token.dep_ for token in spacy_doc]
print(token_pos)
['PRON', 'CCONJ', 'VERB', 'CCONJ', 'NOUN', 'PUNCT']
print(token_tag)
['PRP', 'CC', 'VBP', 'CC', 'NN', '.']
print(token_dep)
['ROOT', 'preconj', 'appos', 'cc', 'conj', 'punct']
[to_nltk_tree(sent.root).pretty_print() for sent in spacy_doc.sents]
They
__________|____
| ski
| __________|______
. either or snowboard
“ski”和“snowboard”之间的关系依赖也是“conj”,在这种情况下它似乎是正确的分类。
我认为答案就在你的问题本身。 "managed" 和 "went" 是由并列连词连接的两个元素,这也是我们在 spacy 的输出中看到的:
text = 'The Titanic managed to sail into the coast intact, and Conan went to Chicago.'
spacy_doc = nlp(text)
[(token.text, token.dep_) for token in spacy_doc]
输出:
[('The', 'det'),
('Titanic', 'nsubj'),
('managed', 'ROOT'),
('to', 'aux'),
('sail', 'xcomp'),
('into', 'prep'),
('the', 'det'),
('coast', 'pobj'),
(' ', ''),
('intact', 'advmod'),
(',', 'punct'),
('and', 'cc'),
('Conan', 'nsubj'),
('went', 'conj'),
('to', 'prep'),
('Chicago', 'pobj'),
('.', 'punct')]
是的,我相信这是正确的。
text='The Titanic managed to sail into the coast intact, and Conan went to Chicago.'
在这个例子中,“managed”和“went”这两个词连接在“and”这个词上,这是并列连词。
这与您在斯坦福依赖手册中提供的定义完全一致:
A conjunct is the relation between two elements connected by a coordinating conjunction, such as “and”, “or”, etc:
“Bill is big and honest”
“They either ski or snowboard”