斯坦福分类器产生错误结果
Stanford Classifier producing wrong results
我正在尝试使用 Stanford Classifier. My example data set is based on Ham or Spam 执行基本的文本分类。
这是我的代码:
Properties props = new Properties();
ColumnDataClassifier cdc = new ColumnDataClassifier(props);
Classifier<String, String> cl = cdc.makeClassifier(cdc.readTrainingExamples("data.train"));
for (String line : ObjectBank.getLineIterator("data.test", "utf-8")) {
Datum<String, String> d = cdc.makeDatumFromLine(line);
System.out.println(line + " ==> " + cl.classOf(d));
}
但是,无论我尝试对什么文本进行分类,它总是将其归类为 Ham。下面这句话明明是垃圾邮件,还是归类为非垃圾邮件:
FREE MESSAGE Activate your 500 FREE Text Messages by replying to this message with the word FREE For terms & conditions, visit www.example.com
我的错误在哪里?
我的错误是我没有提供任何属性:
Properties props = new Properties();
ColumnDataClassifier cdc = new ColumnDataClassifier(props);
您可以直接在代码中指定属性:
// set up pipeline properties
Properties props = new Properties();
// set the list of annotators to run
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,depparse,coref,kbp,quote");
// set a property for an annotator, in this case the coref annotator is being set to use the neural algorithm
props.setProperty("coref.algorithm", "neural");
或者您可以提供属性文件:
ColumnDataClassifier cdc = new ColumnDataClassifier(propFile);
我正在尝试使用 Stanford Classifier. My example data set is based on Ham or Spam 执行基本的文本分类。
这是我的代码:
Properties props = new Properties();
ColumnDataClassifier cdc = new ColumnDataClassifier(props);
Classifier<String, String> cl = cdc.makeClassifier(cdc.readTrainingExamples("data.train"));
for (String line : ObjectBank.getLineIterator("data.test", "utf-8")) {
Datum<String, String> d = cdc.makeDatumFromLine(line);
System.out.println(line + " ==> " + cl.classOf(d));
}
但是,无论我尝试对什么文本进行分类,它总是将其归类为 Ham。下面这句话明明是垃圾邮件,还是归类为非垃圾邮件:
FREE MESSAGE Activate your 500 FREE Text Messages by replying to this message with the word FREE For terms & conditions, visit www.example.com
我的错误在哪里?
我的错误是我没有提供任何属性:
Properties props = new Properties();
ColumnDataClassifier cdc = new ColumnDataClassifier(props);
您可以直接在代码中指定属性:
// set up pipeline properties
Properties props = new Properties();
// set the list of annotators to run
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,parse,depparse,coref,kbp,quote");
// set a property for an annotator, in this case the coref annotator is being set to use the neural algorithm
props.setProperty("coref.algorithm", "neural");
或者您可以提供属性文件:
ColumnDataClassifier cdc = new ColumnDataClassifier(propFile);