OpenNLP 分类器输出
OpenNLP classifier output
目前我正在使用以下代码来训练分类器模型:
final String iterations = "1000";
final String cutoff = "0";
InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("src/main/resources/trainingSets/classifierA.txt"));
ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);
TrainingParameters params = new TrainingParameters();
params.put(TrainingParameters.ITERATIONS_PARAM, iterations);
params.put(TrainingParameters.CUTOFF_PARAM, cutoff);
params.put(AbstractTrainer.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);
DoccatModel model = DocumentCategorizerME.train("NL", sampleStream, params, new DoccatFactory());
OutputStream modelOut = new BufferedOutputStream(new FileOutputStream("src/main/resources/models/model.bin"));
model.serialize(modelOut);
return model;
一切顺利,每次 运行 之后我都会得到以下输出:
Indexing events with TwoPass using cutoff of 0
Computing event counts... done. 1474 events
Indexing... done.
Collecting events... Done indexing in 0,03 s.
Incorporating indexed data for training...
done.
Number of Event Tokens: 1474
Number of Outcomes: 2
Number of Predicates: 4149
Computing model parameters...
Stats: (998/1474) 0.6770691994572592
...done.
谁能解释一下这个输出是什么意思?如果它能说明准确性?
查看source, we can tell this output is done by NaiveBayesTrainer::trainModel方法:
public AbstractModel trainModel(DataIndexer di) {
// ...
display("done.\n");
display("\tNumber of Event Tokens: " + numUniqueEvents + "\n");
display("\t Number of Outcomes: " + numOutcomes + "\n");
display("\t Number of Predicates: " + numPreds + "\n");
display("Computing model parameters...\n");
MutableContext[] finalParameters = findParameters();
display("...done.\n");
// ...
}
如果您查看 findParameters()
代码,您会注意到它调用 trainingStats()
方法,其中包含计算准确度的代码片段:
private double trainingStats(EvalParameters evalParams) {
// ...
double trainingAccuracy = (double) numCorrect / numEvents;
display("Stats: (" + numCorrect + "/" + numEvents + ") " + trainingAccuracy + "\n");
return trainingAccuracy;
}
TL;DR 输出的 Stats: (998/1474) 0.6770691994572592
部分是您要查找的准确度。
目前我正在使用以下代码来训练分类器模型:
final String iterations = "1000";
final String cutoff = "0";
InputStreamFactory dataIn = new MarkableFileInputStreamFactory(new File("src/main/resources/trainingSets/classifierA.txt"));
ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);
TrainingParameters params = new TrainingParameters();
params.put(TrainingParameters.ITERATIONS_PARAM, iterations);
params.put(TrainingParameters.CUTOFF_PARAM, cutoff);
params.put(AbstractTrainer.ALGORITHM_PARAM, NaiveBayesTrainer.NAIVE_BAYES_VALUE);
DoccatModel model = DocumentCategorizerME.train("NL", sampleStream, params, new DoccatFactory());
OutputStream modelOut = new BufferedOutputStream(new FileOutputStream("src/main/resources/models/model.bin"));
model.serialize(modelOut);
return model;
一切顺利,每次 运行 之后我都会得到以下输出:
Indexing events with TwoPass using cutoff of 0
Computing event counts... done. 1474 events
Indexing... done.
Collecting events... Done indexing in 0,03 s.
Incorporating indexed data for training...
done.
Number of Event Tokens: 1474
Number of Outcomes: 2
Number of Predicates: 4149
Computing model parameters...
Stats: (998/1474) 0.6770691994572592
...done.
谁能解释一下这个输出是什么意思?如果它能说明准确性?
查看source, we can tell this output is done by NaiveBayesTrainer::trainModel方法:
public AbstractModel trainModel(DataIndexer di) {
// ...
display("done.\n");
display("\tNumber of Event Tokens: " + numUniqueEvents + "\n");
display("\t Number of Outcomes: " + numOutcomes + "\n");
display("\t Number of Predicates: " + numPreds + "\n");
display("Computing model parameters...\n");
MutableContext[] finalParameters = findParameters();
display("...done.\n");
// ...
}
如果您查看 findParameters()
代码,您会注意到它调用 trainingStats()
方法,其中包含计算准确度的代码片段:
private double trainingStats(EvalParameters evalParams) {
// ...
double trainingAccuracy = (double) numCorrect / numEvents;
display("Stats: (" + numCorrect + "/" + numEvents + ") " + trainingAccuracy + "\n");
return trainingAccuracy;
}
TL;DR 输出的 Stats: (998/1474) 0.6770691994572592
部分是您要查找的准确度。