nlp:这个依赖标签是否正确?在这种情况下它到底意味着什么?

nlp: is this dependence tag correct? What does exactly it mean in this situation?

我正在探索神奇的 python 库,我得到了这个:

text='The Titanic managed to sail into the coast  intact, and Conan went to Chicago.'

token_pos=[token.pos_ 对于 spacy_doc] 中的令牌 token_tag=[token.tag_ 对于 spacy_doc] 中的令牌 token_dep=[token.dep_ 对于 spacy_doc]

中的令牌

token_pos

['DET', 'PROPN', 'VERB', 'PART', 'VERB', 'ADP', 'DET', 'NOUN', 'SPACE', 'ADJ', 'PUNCT', 'CCONJ', 'PROPN', 'VERB', 'ADP', 'PROPN', 'PUNCT']

token_tag

['DT', 'NNP', 'VBD', 'TO', 'VB', 'IN', 'DT', 'NN', '_SP', 'JJ', ',', 'CC', 'NNP', 'VBD', 'IN', 'NNP', '.']

token_dep

['det', 'nsubj', 'ROOT', 'aux', 'xcomp', 'prep', 'det', 'pobj', '', 'advcl', 'punct', 'cc', 'nsubj', 'conj', 'prep', 'pobj', 'punct']

def to_nltk_tree(node):
    if node.n_lefts + node.n_rights > 0:
        return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
    else:
        return node.orth_

[to_nltk_tree(sent.root).pretty_print() for sent in spacy_doc.sents]

                    managed                                 
  _____________________|_________________________            
 |   |     |          sail                       |          
 |   |     |      _____|__________               |           
 |   |     |     |     |         into           went        
 |   |     |     |     |          |          ____|______     
 |   |  Titanic  |     |        coast       |    |      to  
 |   |     |     |     |      ____|____     |    |      |    
 ,  and   The    to  intact the           Conan  .   Chicago

问题:我对“managed”和“went”之间的依赖关系感到困惑。这是一个“conj”。 (1) 这是分类错误吗?如果是分类错误,那么正确的分类是什么?如果不是,您能解释为什么会这样吗? Spacy 将其解释为“连词”:(2) 有没有办法将这种情况与下面的情况区分开来?

spacy.explain('conj')
Out[59]: 'conjunct'

根据stanford dependence manual

连词是由并列连词连接的两个元素之间的关系,例如“和”、“或”等:

“比尔又大又诚实”

“他们要么滑雪要么滑雪板”

conj(大,诚实)

conj(滑雪板、滑雪板)

现在看看这最后一句话:

text='They either ski or snowboard.'

spacy_doc = nlp(text)

token_pos=[token.pos_ for token in spacy_doc]
token_tag=[token.tag_ for token in spacy_doc]
token_dep=[token.dep_ for token in spacy_doc]

print(token_pos)
['PRON', 'CCONJ', 'VERB', 'CCONJ', 'NOUN', 'PUNCT']

print(token_tag)
['PRP', 'CC', 'VBP', 'CC', 'NN', '.']

print(token_dep)
['ROOT', 'preconj', 'appos', 'cc', 'conj', 'punct']

[to_nltk_tree(sent.root).pretty_print() for sent in spacy_doc.sents]
           They              
  __________|____             
 |              ski          
 |     __________|______      
 .  either       or snowboard

“ski”和“snowboard”之间的关系依赖也是“conj”,在这种情况下它似乎是正确的分类。

我认为答案就在你的问题本身。 "managed" 和 "went" 是由并列连词连接的两个元素,这也是我们在 spacy 的输出中看到的:

text = 'The Titanic managed to sail into the coast  intact, and Conan went to Chicago.'

spacy_doc = nlp(text)
[(token.text, token.dep_) for token in spacy_doc]

输出:

[('The', 'det'),
 ('Titanic', 'nsubj'),
 ('managed', 'ROOT'),
 ('to', 'aux'),
 ('sail', 'xcomp'),
 ('into', 'prep'),
 ('the', 'det'),
 ('coast', 'pobj'),
 (' ', ''),
 ('intact', 'advmod'),
 (',', 'punct'),
 ('and', 'cc'),
 ('Conan', 'nsubj'),
 ('went', 'conj'),
 ('to', 'prep'),
 ('Chicago', 'pobj'),
 ('.', 'punct')]

是的,我相信这是正确的。

text='The Titanic managed to sail into the coast intact, and Conan went to Chicago.'

在这个例子中,“managed”和“went”这两个词连接在“and”这个词上,这是并列连词。

这与您在斯坦福依赖手册中提供的定义完全一致:

A conjunct is the relation between two elements connected by a coordinating conjunction, such as “and”, “or”, etc:

“Bill is big and honest

“They either ski or snowboard