如何使用 NLP 识别句子中的肇事者和受害者?

How can I identify the perpetrator and victim in a sentence using NLP?

我是 NLP 的新手,正在寻找可以帮助我确定主题的主题进行探索。具体来说,受害者和攻击者在以下情况下:

The UK was attacked by China over several weeks

Over several weeks, China attacked the UK.

使用 SpaCy,我已经确定了主题,但它们会根据位置而变化:

nlp = spacy.load("en_core_web_sm")
doc1 = nlp("China attacked the UK over several weeks")
doc2 = nlp("The UK was attacked by China over several weeks")
docs = [doc1, doc2]
for doc in docs:
  print("============")
  for chunk in doc.noun_chunks:
    print(chunk.text, chunk.root.text, chunk.root.dep_,
            chunk.root.head.text)

输出:

============
China China nsubj attacked
the UK UK dobj attacked
several weeks weeks pobj over
============
The UK UK nsubjpass attacked
China China pobj by
several weeks weeks pobj over

如有任何帮助和指导,我们将不胜感激。

这叫做语义角色标注,很难。在 spaCy 中,我们的一般建议是不要将其建模为 NER,而是使用通用的 NER 标签,如 PERSON(或此处的 GPE)和依赖项解析,看看在考虑其他方法之前你能走多远。

请参阅 spaCy 课程 chapter 4 中的第 10 节,了解有关此问题的非常具体的概述。

我推荐 Jurafsky & Martin's book

要了解有关该主题的研究概况