当我使用 Stanford CoreNLP 重新训练情感模型以与相关论文的结果进行比较时,我得到了不同的结果
I got a different result when I retrained the sentiment model with Stanford CoreNLP to compare with the related paper's result
我下载了 stanford-corenlp-full-2015-12-09。
我使用以下命令创建了一个训练模型:
java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz
训练结束后,我发现目录中有很多文件。
the model list
然后我使用包中的评估工具,我 运行 代码如下:
java -cp * edu.stanford.nlp.sentiment.Evaluate -model model-0024-79.82.ser.gz -treebank test.txt
test.txt 来自 trainDevTestTrees_PTB.zip。这是关于代码的结果:
F:\trainDevTestTrees_PTB\trees>java -cp * edu.stanford.nlp.sentiment.Evaluate -model model-0024-79.82.ser.gz -treebank test.txt
EVALUATION SUMMARY
Tested 82600 labels
65331 correct
17269 incorrect
0.790932 accuracy
Tested 2210 roots
890 correct
1320 incorrect
0.402715 accuracy
Label confusion matrix
Guess/Gold 0 1 2 3 4 Marg. (Guess)
0 551 340 87 32 6 1016
1 956 5348 2476 686 191 9657
2 354 2812 51386 3097 467 58116
3 146 744 2525 6804 1885 12104
4 1 11 74 379 1242 1707
Marg. (Gold) 2008 9255 56548 10998 3791
0 prec=0.54232, recall=0.2744, spec=0.99423, f1=0.36442
1 prec=0.5538, recall=0.57785, spec=0.94125, f1=0.56557
2 prec=0.8842, recall=0.90871, spec=0.74167, f1=0.89629
3 prec=0.56213, recall=0.61866, spec=0.92598, f1=0.58904
4 prec=0.72759, recall=0.32762, spec=0.9941, f1=0.4518
Root label confusion matrix
Guess/Gold 0 1 2 3 4 Marg. (Guess)
0 50 60 12 9 3 134
1 161 370 147 94 36 808
2 31 103 102 60 32 328
3 36 97 123 305 265 826
4 1 3 5 42 63 114
Marg. (Gold) 279 633 389 510 399
0 prec=0.37313, recall=0.17921, spec=0.9565, f1=0.24213
1 prec=0.45792, recall=0.58452, spec=0.72226, f1=0.51353
2 prec=0.31098, recall=0.26221, spec=0.87589, f1=0.28452
3 prec=0.36925, recall=0.59804, spec=0.69353, f1=0.45659
4 prec=0.55263, recall=0.15789, spec=0.97184, f1=0.24561
Approximate Negative label accuracy: 0.638817
Approximate Positive label accuracy: 0.697140
Combined approximate label accuracy: 0.671925
Approximate Negative root label accuracy: 0.702851
Approximate Positive root label accuracy: 0.742574
Combined approximate root label accuracy: 0.722680
fine-grained 和 positive/negative 的准确性与论文有很大不同 "Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y. and Potts, C., 2013, October. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing (EMNLP) (Vol. 1631, p. 1642)."
该论文指出细粒度和 positive/negative 的准确性高于我的。
The records in the paper
我的操作有什么问题吗?为什么我的结果和论文不一样?
简短的回答是这篇论文使用了一个用 Matlab 编写的不同系统。 Java系统与论文不匹配。尽管我们确实分发了我们在 Matlab 中使用英文模型 jar 训练的二进制模型。因此,您可以 运行 使用 Stanford CoreNLP 的二元模型,但此时您无法使用 Stanford CoreNLP 训练具有类似性能的二元模型。
我下载了 stanford-corenlp-full-2015-12-09。 我使用以下命令创建了一个训练模型:
java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz
训练结束后,我发现目录中有很多文件。 the model list
然后我使用包中的评估工具,我 运行 代码如下:
java -cp * edu.stanford.nlp.sentiment.Evaluate -model model-0024-79.82.ser.gz -treebank test.txt
test.txt 来自 trainDevTestTrees_PTB.zip。这是关于代码的结果:
F:\trainDevTestTrees_PTB\trees>java -cp * edu.stanford.nlp.sentiment.Evaluate -model model-0024-79.82.ser.gz -treebank test.txt
EVALUATION SUMMARY
Tested 82600 labels
65331 correct
17269 incorrect
0.790932 accuracy
Tested 2210 roots
890 correct
1320 incorrect
0.402715 accuracy
Label confusion matrix
Guess/Gold 0 1 2 3 4 Marg. (Guess)
0 551 340 87 32 6 1016
1 956 5348 2476 686 191 9657
2 354 2812 51386 3097 467 58116
3 146 744 2525 6804 1885 12104
4 1 11 74 379 1242 1707
Marg. (Gold) 2008 9255 56548 10998 3791
0 prec=0.54232, recall=0.2744, spec=0.99423, f1=0.36442
1 prec=0.5538, recall=0.57785, spec=0.94125, f1=0.56557
2 prec=0.8842, recall=0.90871, spec=0.74167, f1=0.89629
3 prec=0.56213, recall=0.61866, spec=0.92598, f1=0.58904
4 prec=0.72759, recall=0.32762, spec=0.9941, f1=0.4518
Root label confusion matrix
Guess/Gold 0 1 2 3 4 Marg. (Guess)
0 50 60 12 9 3 134
1 161 370 147 94 36 808
2 31 103 102 60 32 328
3 36 97 123 305 265 826
4 1 3 5 42 63 114
Marg. (Gold) 279 633 389 510 399
0 prec=0.37313, recall=0.17921, spec=0.9565, f1=0.24213
1 prec=0.45792, recall=0.58452, spec=0.72226, f1=0.51353
2 prec=0.31098, recall=0.26221, spec=0.87589, f1=0.28452
3 prec=0.36925, recall=0.59804, spec=0.69353, f1=0.45659
4 prec=0.55263, recall=0.15789, spec=0.97184, f1=0.24561
Approximate Negative label accuracy: 0.638817
Approximate Positive label accuracy: 0.697140
Combined approximate label accuracy: 0.671925
Approximate Negative root label accuracy: 0.702851
Approximate Positive root label accuracy: 0.742574
Combined approximate root label accuracy: 0.722680
fine-grained 和 positive/negative 的准确性与论文有很大不同 "Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y. and Potts, C., 2013, October. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing (EMNLP) (Vol. 1631, p. 1642)." 该论文指出细粒度和 positive/negative 的准确性高于我的。 The records in the paper
我的操作有什么问题吗?为什么我的结果和论文不一样?
简短的回答是这篇论文使用了一个用 Matlab 编写的不同系统。 Java系统与论文不匹配。尽管我们确实分发了我们在 Matlab 中使用英文模型 jar 训练的二进制模型。因此,您可以 运行 使用 Stanford CoreNLP 的二元模型,但此时您无法使用 Stanford CoreNLP 训练具有类似性能的二元模型。