SpaCy NER:同一个词可以是两个不同实体的一部分吗?

SpaCy NER: Can a same word be part of two different entities?

例如:

一句话:世界上最好的产品是雀巢饼干。

Entities:

BRAND: Nestle

PRODUCT: Nestle Cookie

以上实体是否有效,或者我应该将它们标记为:

Entities:

BRAND: Nestle

PRODUCT: Cookie

它会影响模型性能吗?

来自documentation

The entity recognizer is constrained to predict only non-overlapping, non-nested spans. The training data should obey the same constraint. If you like, you could have two sentences with the different annotations in your data. I’m not sure whether this would hurt or help your performance, though.

If you want spaCy to learn to recover both annotations, you could have two EntityRecognizer instances in the pipeline. You would need to move the entity annotations into an extension attribute, because you don’t want the second entity recogniser to overwrite the entities set by the first one.

后果:

如果你想要一个单一的 NER 标记器,你必须按如下方式标记:
实体:品牌:雀巢产品:Cookie

如果您想训练两个单独的 NER 标注器(一个用于品牌,一个用于产品),那么您可以这样做:
实体:品牌:雀巢产品:雀巢饼干