OpenNLP 分类器输出

OpenNLP classifier output

目前我正在使用以下代码来训练分类器模型:

    final String iterations = "1000";
    final String cutoff = "0";
    InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("src/main/resources/trainingSets/classifierA.txt"));
    ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
    ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);

    TrainingParameters params = new TrainingParameters();
    params.put(TrainingParameters.ITERATIONS_PARAM, iterations);
    params.put(TrainingParameters.CUTOFF_PARAM, cutoff);
    params.put(AbstractTrainer.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);

    DoccatModel model = DocumentCategorizerME.train("NL", sampleStream, params, new DoccatFactory());

    OutputStream modelOut = new BufferedOutputStream(new FileOutputStream("src/main/resources/models/model.bin"));
    model.serialize(modelOut);

    return model;

一切顺利,每次 运行 之后我都会得到以下输出:

    Indexing events with TwoPass using cutoff of 0

    Computing event counts...  done. 1474 events
    Indexing...  done.
Collecting events... Done indexing in 0,03 s.
Incorporating indexed data for training...  
done.
    Number of Event Tokens: 1474
        Number of Outcomes: 2
      Number of Predicates: 4149
Computing model parameters...
Stats: (998/1474) 0.6770691994572592
...done.

谁能解释一下这个输出是什么意思?如果它能说明准确性?

查看source, we can tell this output is done by NaiveBayesTrainer::trainModel方法:

public AbstractModel trainModel(DataIndexer di) {
    // ...
    display("done.\n");
    display("\tNumber of Event Tokens: " + numUniqueEvents + "\n");
    display("\t    Number of Outcomes: " + numOutcomes + "\n");
    display("\t  Number of Predicates: " + numPreds + "\n");
    display("Computing model parameters...\n");
    MutableContext[] finalParameters = findParameters();
    display("...done.\n");
    // ...
}

如果您查看 findParameters() 代码,您会注意到它调用 trainingStats() 方法,其中包含计算准确度的代码片段:

private double trainingStats(EvalParameters evalParams) {
    // ...
    double trainingAccuracy = (double) numCorrect / numEvents;
    display("Stats: (" + numCorrect + "/" + numEvents + ") " + trainingAccuracy + "\n");
    return trainingAccuracy;
}

TL;DR 输出的 Stats: (998/1474) 0.6770691994572592 部分是您要查找的准确度。