丢弃事件消息 OpenNLP。训练数据在 OpenNLP 中被丢弃

Dropped event message OpenNLP. Training data is dropped in OpenNLP

我有标记数据(标签和文本),像这样:

category1, "train message 1"
category1, "train message 2"
category1, "train message 3"
category2, "train message 4"
category2, "train messsage 5"

我尝试使用 Java OpenNLP 库训练我的分类模型

DoccatModel model = DocumentCategorizerME.train("pt", sampleStream, params, customFactory);

当我训练模型时,我收到奇怪的消息:

**Indexing events using cutoff of 5**
**Computing event counts...  done. 5441 events**
Dropped event animals*:[bow=live, bow=animals, ng=:live:animals]
Dropped event animals*:[bow=aquariums]
Dropped event animals*:[bow=aquatic, bow=plant, bow=fertilizers, ng=:aquatic:plant,ng=:aquatic:plant:fertilizers, ng=:plant:fertilizers]
Dropped event apparel*:[bow=activewear]
Dropped event apparel*:[bow=one, bow=pieces, ng=:one:pieces]

为什么是 Dropped event "category": [....]?**

我添加了自定义工厂,它起作用了

int minNgramSize = 2;
int maxNgramSize = 3;
DoccatFactory customFactory = new DoccatFactory(new FeatureGenerator[]{
            new BagOfWordsFeatureGenerator(),
            new NGramFeatureGenerator(minNgramSize, maxNgramSize)
            });
DoccatModel model = DocumentCategorizerME.train("pt", sampleStream, params, customFactory);