斯坦福分类器产生错误结果

Stanford Classifier producing wrong results

我正在尝试使用 Stanford Classifier. My example data set is based on Ham or Spam 执行基本的文本分类。

这是我的代码:

Properties props = new Properties();
ColumnDataClassifier cdc = new ColumnDataClassifier(props);

Classifier<String, String> cl = cdc.makeClassifier(cdc.readTrainingExamples("data.train"));

for (String line : ObjectBank.getLineIterator("data.test", "utf-8")) {
    Datum<String, String> d = cdc.makeDatumFromLine(line);
    System.out.println(line + "  ==>  " + cl.classOf(d));
}

但是,无论我尝试对什么文本进行分类,它总是将其归类为 Ham。下面这句话明明是垃圾邮件,还是归类为非垃圾邮件:

FREE MESSAGE Activate your 500 FREE Text Messages by replying to this message with the word FREE For terms & conditions, visit www.example.com

我的错误在哪里?

我的错误是我没有提供任何属性:

Properties props = new Properties();
ColumnDataClassifier cdc = new ColumnDataClassifier(props);

您可以直接在代码中指定属性:

// set up pipeline properties
Properties props = new Properties();
// set the list of annotators to run
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,depparse,coref,kbp,quote");
// set a property for an annotator, in this case the coref annotator is being set to use the neural algorithm
props.setProperty("coref.algorithm", "neural");

或者您可以提供属性文件:

ColumnDataClassifier cdc = new ColumnDataClassifier(propFile);