NLP:比较这两个句子。这是错误分类吗?

NLP: Compare these two sentences. Is this a misclassification?

我正在使用 spacy 的依赖解析。我对这两个非常相似的句子感到困惑。

句子 1:

text='He noted his father was a nice guy.'

请注意,在这句话中,“父亲”显然是“父亲是个好人”的主语:

[(0, 'He', '-PRON-', 'PRON', 'PRP', 'nsubj'), (1, 'noted', 'note', 'VERB', 'VBD', 'ROOT'), (2, 'his', '-PRON-', 'DET', 'PRP$', 'poss'), (3, 'father', 'father', 'NOUN', 'NN', 'nsubj'), (4, 'was', 'be', 'VERB', 'VBD', 'ccomp'), (5, 'a', 'a', 'DET', 'DT', 'det'), (6, 'nice', 'nice', 'ADJ', 'JJ', 'amod'), (7, 'guy', 'guy', 'NOUN', 'NN', 'attr'), (8, '.', '.', 'PUNCT', '.', 'punct')]

        noted              
  ________|_____            
 |   |         was         
 |   |     _____|___        
 |   |  father     guy     
 |   |    |      ___|___    
 He  .   his    a      nice

for child in the_verb.children:
    print(child,child.dep_)
    
>> father nsubj
>> guy attr

for ancestor in the_verb.ancestors:
    print(ancestor,ancestor.dep_)
    
>> noted ROOT

句子 2:

text='He noted his father, as \"a man with different attributes\", was a nice guy.'

这是上一句的细微变化。 “父亲”不再是主题了。

[(0, 'He', '-PRON-', 'PRON', 'PRP', 'nsubj'), (1, 'noted', 'note', 'VERB', 'VBD', 'ROOT'), (2, 'his', '-PRON-', 'DET', 'PRP$', 'poss'), (3, 'father', 'father', 'NOUN', 'NN', 'dobj'), (4, ',', ',', 'PUNCT', ',', 'punct'), (5, 'as', 'as', 'ADP', 'IN', 'prep'), (6, '"', '"', 'PUNCT', '``', 'punct'), (7, 'a', 'a', 'DET', 'DT', 'det'), (8, 'man', 'man', 'NOUN', 'NN', 'pobj'), (9, 'with', 'with', 'ADP', 'IN', 'prep'), (10, 'different', 'different', 'ADJ', 'JJ', 'amod'), (11, 'attributes', 'attribute', 'NOUN', 'NNS', 'pobj'), (12, '"', '"', 'PUNCT', "''", 'punct'), (13, ',', ',', 'PUNCT', ',', 'punct'), (14, 'was', 'be', 'VERB', 'VBD', 'conj'), (15, 'a', 'a', 'DET', 'DT', 'det'), (16, 'nice', 'nice', 'ADJ', 'JJ', 'amod'), (17, 'guy', 'guy', 'NOUN', 'NN', 'attr'), (18, '.', '.', 'PUNCT', '.', 'punct')]

                noted                                 
  ________________|____________________________        
 |   |   |   |    |         as                 |      
 |   |   |   |    |         |                  |       
 |   |   |   |    |        man                 |      
 |   |   |   |    |      ___|______            |       
 |   |   |   |    |     |   |     with        was     
 |   |   |   |    |     |   |      |           |       
 |   |   |   |  father  |   a  attributes     guy     
 |   |   |   |    |     |   |      |        ___|___    
 He  ,   ,   .   his    "   "  different   a      nice


the_verb=spacy_doc[14]

for child in the_verb.children:
    print(child,child.dep_)
    
>> guy attr

for ancestor in the_verb.ancestors:
    print(ancestor,ancestor.dep_)
    
>> noted ROOT

我想了解 spacy 如何对句子进行分类。第二种情况是误分类错误吗?我是说“父亲”应该还是主语吧?

我想知道你是否在考虑解析树而不是依赖树...

老实说,我一直对依赖树感到困惑。例如,他们擅长识别结构之间的 relative 联系,但我认为他们不擅长确定 absolute 语义结构。短语结构规则非常擅长确定特定名词、动词及其成分的 绝对 parts-of-speech;虽然还不完美。虽然依赖解析器可用于检测名词块和介词短语,并推断动词短语,但我认为这不是它的主要功能。 解析树的主要功能。

return你的问题:

你谈论“父亲”作为主题的方式听起来像是你在试图理解深层句法结构(绝对)但使用的是相对模型(依赖解析器)。

从本质上讲,我认为将短语'作为“具有不同属性的人”'是在为依存关系树添加层。这些层次将实际主语“他的父亲”与动词短语“是一个好人”分开。我想它会为逗号添加一层,为引号添加另一层,为 as-clause 添加另一层。直到最后,依赖解析器应该确定的相对关系变得“太远”了。

句法分析只能与生成它们的模型一样好。事实上,您会看到 SpaCy 有 2 个 POS 指示器,它们都试图执行句法分析。一个由依赖解析器生成(在 token.dep_ 下可用),另一个由统计模型生成(在 token.pos_ 下可用)。您还会看到,由于预测它们的模型的不精确性,这些 POS 指标并不总是匹配。

出于兴趣,我相信 NLTK has a more traditional phrase-structure-rules-based parse tree available; although even these have limitations. If you want deep, hard-core syntactic analyses of real-life sentences, you may want to try something like Head-driven phrase structure grammar (HPSG) 但您会发现事情开始变得只是 一点点 技术性。 :)