NLP:比较这两个句子。这是错误分类吗?
NLP: Compare these two sentences. Is this a misclassification?
我正在使用 spacy 的依赖解析。我对这两个非常相似的句子感到困惑。
句子 1:
text='He noted his father was a nice guy.'
请注意,在这句话中,“父亲”显然是“父亲是个好人”的主语:
[(0, 'He', '-PRON-', 'PRON', 'PRP', 'nsubj'), (1, 'noted', 'note', 'VERB', 'VBD', 'ROOT'), (2, 'his', '-PRON-', 'DET', 'PRP$', 'poss'), (3, 'father', 'father', 'NOUN', 'NN', 'nsubj'), (4, 'was', 'be', 'VERB', 'VBD', 'ccomp'), (5, 'a', 'a', 'DET', 'DT', 'det'), (6, 'nice', 'nice', 'ADJ', 'JJ', 'amod'), (7, 'guy', 'guy', 'NOUN', 'NN', 'attr'), (8, '.', '.', 'PUNCT', '.', 'punct')]
noted
________|_____
| | was
| | _____|___
| | father guy
| | | ___|___
He . his a nice
for child in the_verb.children:
print(child,child.dep_)
>> father nsubj
>> guy attr
for ancestor in the_verb.ancestors:
print(ancestor,ancestor.dep_)
>> noted ROOT
句子 2:
text='He noted his father, as \"a man with different attributes\", was a nice guy.'
这是上一句的细微变化。 “父亲”不再是主题了。
[(0, 'He', '-PRON-', 'PRON', 'PRP', 'nsubj'), (1, 'noted', 'note', 'VERB', 'VBD', 'ROOT'), (2, 'his', '-PRON-', 'DET', 'PRP$', 'poss'), (3, 'father', 'father', 'NOUN', 'NN', 'dobj'), (4, ',', ',', 'PUNCT', ',', 'punct'), (5, 'as', 'as', 'ADP', 'IN', 'prep'), (6, '"', '"', 'PUNCT', '``', 'punct'), (7, 'a', 'a', 'DET', 'DT', 'det'), (8, 'man', 'man', 'NOUN', 'NN', 'pobj'), (9, 'with', 'with', 'ADP', 'IN', 'prep'), (10, 'different', 'different', 'ADJ', 'JJ', 'amod'), (11, 'attributes', 'attribute', 'NOUN', 'NNS', 'pobj'), (12, '"', '"', 'PUNCT', "''", 'punct'), (13, ',', ',', 'PUNCT', ',', 'punct'), (14, 'was', 'be', 'VERB', 'VBD', 'conj'), (15, 'a', 'a', 'DET', 'DT', 'det'), (16, 'nice', 'nice', 'ADJ', 'JJ', 'amod'), (17, 'guy', 'guy', 'NOUN', 'NN', 'attr'), (18, '.', '.', 'PUNCT', '.', 'punct')]
noted
________________|____________________________
| | | | | as |
| | | | | | |
| | | | | man |
| | | | | ___|______ |
| | | | | | | with was
| | | | | | | | |
| | | | father | a attributes guy
| | | | | | | | ___|___
He , , . his " " different a nice
the_verb=spacy_doc[14]
for child in the_verb.children:
print(child,child.dep_)
>> guy attr
for ancestor in the_verb.ancestors:
print(ancestor,ancestor.dep_)
>> noted ROOT
我想了解 spacy 如何对句子进行分类。第二种情况是误分类错误吗?我是说“父亲”应该还是主语吧?
我想知道你是否在考虑解析树而不是依赖树...
老实说,我一直对依赖树感到困惑。例如,他们擅长识别结构之间的 relative 联系,但我认为他们不擅长确定 absolute 语义结构。短语结构规则非常擅长确定特定名词、动词及其成分的 绝对 parts-of-speech;虽然还不完美。虽然依赖解析器可用于检测名词块和介词短语,并推断动词短语,但我认为这不是它的主要功能。 是解析树的主要功能。
return你的问题:
你谈论“父亲”作为主题的方式听起来像是你在试图理解深层句法结构(绝对)但使用的是相对模型(依赖解析器)。
从本质上讲,我认为将短语'作为“具有不同属性的人”'是在为依存关系树添加层。这些层次将实际主语“他的父亲”与动词短语“是一个好人”分开。我想它会为逗号添加一层,为引号添加另一层,为 as-clause 添加另一层。直到最后,依赖解析器应该确定的相对关系变得“太远”了。
句法分析只能与生成它们的模型一样好。事实上,您会看到 SpaCy 有 2 个 POS 指示器,它们都试图执行句法分析。一个由依赖解析器生成(在 token.dep_ 下可用),另一个由统计模型生成(在 token.pos_ 下可用)。您还会看到,由于预测它们的模型的不精确性,这些 POS 指标并不总是匹配。
出于兴趣,我相信 NLTK has a more traditional phrase-structure-rules-based parse tree available; although even these have limitations. If you want deep, hard-core syntactic analyses of real-life sentences, you may want to try something like Head-driven phrase structure grammar (HPSG) 但您会发现事情开始变得只是 一点点 技术性。 :)
我正在使用 spacy 的依赖解析。我对这两个非常相似的句子感到困惑。
句子 1:
text='He noted his father was a nice guy.'
请注意,在这句话中,“父亲”显然是“父亲是个好人”的主语:
[(0, 'He', '-PRON-', 'PRON', 'PRP', 'nsubj'), (1, 'noted', 'note', 'VERB', 'VBD', 'ROOT'), (2, 'his', '-PRON-', 'DET', 'PRP$', 'poss'), (3, 'father', 'father', 'NOUN', 'NN', 'nsubj'), (4, 'was', 'be', 'VERB', 'VBD', 'ccomp'), (5, 'a', 'a', 'DET', 'DT', 'det'), (6, 'nice', 'nice', 'ADJ', 'JJ', 'amod'), (7, 'guy', 'guy', 'NOUN', 'NN', 'attr'), (8, '.', '.', 'PUNCT', '.', 'punct')]
noted
________|_____
| | was
| | _____|___
| | father guy
| | | ___|___
He . his a nice
for child in the_verb.children:
print(child,child.dep_)
>> father nsubj
>> guy attr
for ancestor in the_verb.ancestors:
print(ancestor,ancestor.dep_)
>> noted ROOT
句子 2:
text='He noted his father, as \"a man with different attributes\", was a nice guy.'
这是上一句的细微变化。 “父亲”不再是主题了。
[(0, 'He', '-PRON-', 'PRON', 'PRP', 'nsubj'), (1, 'noted', 'note', 'VERB', 'VBD', 'ROOT'), (2, 'his', '-PRON-', 'DET', 'PRP$', 'poss'), (3, 'father', 'father', 'NOUN', 'NN', 'dobj'), (4, ',', ',', 'PUNCT', ',', 'punct'), (5, 'as', 'as', 'ADP', 'IN', 'prep'), (6, '"', '"', 'PUNCT', '``', 'punct'), (7, 'a', 'a', 'DET', 'DT', 'det'), (8, 'man', 'man', 'NOUN', 'NN', 'pobj'), (9, 'with', 'with', 'ADP', 'IN', 'prep'), (10, 'different', 'different', 'ADJ', 'JJ', 'amod'), (11, 'attributes', 'attribute', 'NOUN', 'NNS', 'pobj'), (12, '"', '"', 'PUNCT', "''", 'punct'), (13, ',', ',', 'PUNCT', ',', 'punct'), (14, 'was', 'be', 'VERB', 'VBD', 'conj'), (15, 'a', 'a', 'DET', 'DT', 'det'), (16, 'nice', 'nice', 'ADJ', 'JJ', 'amod'), (17, 'guy', 'guy', 'NOUN', 'NN', 'attr'), (18, '.', '.', 'PUNCT', '.', 'punct')]
noted
________________|____________________________
| | | | | as |
| | | | | | |
| | | | | man |
| | | | | ___|______ |
| | | | | | | with was
| | | | | | | | |
| | | | father | a attributes guy
| | | | | | | | ___|___
He , , . his " " different a nice
the_verb=spacy_doc[14]
for child in the_verb.children:
print(child,child.dep_)
>> guy attr
for ancestor in the_verb.ancestors:
print(ancestor,ancestor.dep_)
>> noted ROOT
我想了解 spacy 如何对句子进行分类。第二种情况是误分类错误吗?我是说“父亲”应该还是主语吧?
我想知道你是否在考虑解析树而不是依赖树...
老实说,我一直对依赖树感到困惑。例如,他们擅长识别结构之间的 relative 联系,但我认为他们不擅长确定 absolute 语义结构。短语结构规则非常擅长确定特定名词、动词及其成分的 绝对 parts-of-speech;虽然还不完美。虽然依赖解析器可用于检测名词块和介词短语,并推断动词短语,但我认为这不是它的主要功能。 是解析树的主要功能。
return你的问题:
你谈论“父亲”作为主题的方式听起来像是你在试图理解深层句法结构(绝对)但使用的是相对模型(依赖解析器)。
从本质上讲,我认为将短语'作为“具有不同属性的人”'是在为依存关系树添加层。这些层次将实际主语“他的父亲”与动词短语“是一个好人”分开。我想它会为逗号添加一层,为引号添加另一层,为 as-clause 添加另一层。直到最后,依赖解析器应该确定的相对关系变得“太远”了。
句法分析只能与生成它们的模型一样好。事实上,您会看到 SpaCy 有 2 个 POS 指示器,它们都试图执行句法分析。一个由依赖解析器生成(在 token.dep_ 下可用),另一个由统计模型生成(在 token.pos_ 下可用)。您还会看到,由于预测它们的模型的不精确性,这些 POS 指标并不总是匹配。
出于兴趣,我相信 NLTK has a more traditional phrase-structure-rules-based parse tree available; although even these have limitations. If you want deep, hard-core syntactic analyses of real-life sentences, you may want to try something like Head-driven phrase structure grammar (HPSG) 但您会发现事情开始变得只是 一点点 技术性。 :)