为什么 Mallet 文本分类对所有测试文件输出相同的值 1.0?
Why Mallet text classification output the same value 1.0 for all test files?
我正在学习 Mallet 文本分类命令行。估计differrent类的输出值都是一样的1.0。我不知道我哪里不对。你能帮忙吗?
槌版:E:\Mallet\mallet-2.0.8RC3
//there is a txt file about cat breed (catmaterial.txt) in cat dir.
//command 1
C:\Users\toshiba>mallet import-dir --input E:\Mallet\testmaterial\cat --output E
:\Mallet\testmaterial\cat.mallet --remove-stopwords
//command 1 output
Labels =
E:\Mallet\testmaterial\cat
//command 2, save classifier as catClass.classifier
C:\Users\toshiba>mallet train-classifier --input E:\Mallet\testmaterial\cat.mall
et --trainer NaiveBayes --output-classifier E:\Mallet\testmaterial\catClass.clas
sifier
//command 2 output
Training portion = 1.0
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.0
-------------------- Trial 0 --------------------
Trial 0 Training NaiveBayesTrainer with 1 instances
Trial 0 Training NaiveBayesTrainer finished
No examples with predicted label !
No examples with true label !
No examples with predicted label !
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer training data accuracy = 1.0
Trial 0 Trainer NaiveBayesTrainer Test Data Confusion Matrix
No examples with predicted label !
Trial 0 Trainer NaiveBayesTrainer test data precision() = 1.0
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer test data recall() = 1.0
No examples with predicted label !
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer test data F1() = 1.0
Trial 0 Trainer NaiveBayesTrainer test data accuracy = NaN
NaiveBayesTrainer
Summary. train accuracy mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test accuracy mean = NaN stddev = NaN stderr = NaN
Summary. test precision() mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test recall() mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test f1() mean = 1.0 stddev = 0.0 stderr = 0.0
//command 3, estimate classes of the three files about cat, deer and dog. The cat file is the same as the one for cat.mallet
C:\Users\toshiba>mallet classify-dir --input E:\Mallet\testmaterial\test_cat_dir
--output - --classifier E:\Mallet\testmaterial\catClass.classifier
//command 3 output
file:/E:/Mallet/testmaterial/test_cat_dir/catmaterial.txt 1.0
file:/E:/Mallet/testmaterial/test_cat_dir/deertext.txt 1.0
file:/E:/Mallet/testmaterial/test_cat_dir/dogmaterial.txt 1.0
// why the three classes are all 1.0 ?
C:\Users\toshiba>
你能帮忙吗?
谢谢
+++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++
更新:
感谢您的回答,但仍然为所有文件输出 1.0。
我的想法是,我将一些狗文件放在 dog 目录中,并将这些狗文件作为实例,训练模型,然后测试 test_dir 中的一些文件以查看结果。
根据我对你的建议的理解,我尝试了,但仍然输出相同的 1.0。
你能用下面的命令行帮助我吗?
在E:\Mallet\train_dir\dog,有4个dog txt文件(dog 2.txt,dog4.txt,dog5.txt,dogmaterial.txt).
在E:\Mallet\test_dir,有9个txt文件(cat2.txt,catmaterial.txt,deermaterial.txt,dog3.txt,dog6.txt , 狗 2.txt, dog4.txt, dog5.txt, dogmaterial.txt).
C:\Users\toshiba>mallet import-dir --input E:\Mallet\train_dir\dog --output E:\M
allet\classifier_diranimal.mallet --remove-stopwords
Labels =
E:\Mallet\train_dir\dog
C:\Users\toshiba>mallet train-classifier --input E:\Mallet\classifier_diranima
l.mallet --trainer NaiveBayes --output-classifier E:\Mallet\classifier_diranim
alClass.classifier
Training portion = 1.0
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.0
-------------------- Trial 0 --------------------
Trial 0 Training NaiveBayesTrainer with 4 instances
Trial 0 Training NaiveBayesTrainer finished
No examples with predicted label !
No examples with true label !
No examples with predicted label !
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer training data accuracy = 1.0
Trial 0 Trainer NaiveBayesTrainer Test Data Confusion Matrix
No examples with predicted label !
Trial 0 Trainer NaiveBayesTrainer test data precision() = 1.0
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer test data recall() = 1.0
No examples with predicted label !
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer test data F1() = 1.0
Trial 0 Trainer NaiveBayesTrainer test data accuracy = NaN
NaiveBayesTrainer
Summary. train accuracy mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test accuracy mean = NaN stddev = NaN stderr = NaN
Summary. test precision() mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test recall() mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test f1() mean = 1.0 stddev = 0.0 stderr = 0.0
C:\Users\toshiba>mallet classify-dir --input E:\Mallet\test_dir --output - --cla
ssifier E:\Mallet\classifier_diranimalClass.classifier
file:/E:/Mallet/test_dir/cat2.txt 1.0
file:/E:/Mallet/test_dir/catmaterial.txt 1.0
file:/E:/Mallet/test_dir/deertext.txt 1.0
file:/E:/Mallet/test_dir/dog%202.txt 1.0
file:/E:/Mallet/test_dir/dog3.txt 1.0
file:/E:/Mallet/test_dir/dog4.txt 1.0
file:/E:/Mallet/test_dir/dog5.txt 1.0
file:/E:/Mallet/test_dir/dog6.txt 1.0
file:/E:/Mallet/test_dir/dogmaterial.txt 1.0
C:\Users\toshiba>
谢谢。
有两个输入选项。 input-dir
将目录视为 classes,并将每个目录中的每个文件视为一个输入实例。 input-file
逐行读取输入文件,并将行内的各个字段视为标签和实例数据。您正在使用 files-in-directories 输入类型,因此您正在创建一个具有一个 class 和一个实例的 classifier。我猜你想要 lines-in-file 类型。
我正在学习 Mallet 文本分类命令行。估计differrent类的输出值都是一样的1.0。我不知道我哪里不对。你能帮忙吗?
槌版:E:\Mallet\mallet-2.0.8RC3
//there is a txt file about cat breed (catmaterial.txt) in cat dir.
//command 1
C:\Users\toshiba>mallet import-dir --input E:\Mallet\testmaterial\cat --output E
:\Mallet\testmaterial\cat.mallet --remove-stopwords
//command 1 output
Labels =
E:\Mallet\testmaterial\cat
//command 2, save classifier as catClass.classifier
C:\Users\toshiba>mallet train-classifier --input E:\Mallet\testmaterial\cat.mall
et --trainer NaiveBayes --output-classifier E:\Mallet\testmaterial\catClass.clas
sifier
//command 2 output
Training portion = 1.0
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.0
-------------------- Trial 0 --------------------
Trial 0 Training NaiveBayesTrainer with 1 instances
Trial 0 Training NaiveBayesTrainer finished
No examples with predicted label !
No examples with true label !
No examples with predicted label !
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer training data accuracy = 1.0
Trial 0 Trainer NaiveBayesTrainer Test Data Confusion Matrix
No examples with predicted label !
Trial 0 Trainer NaiveBayesTrainer test data precision() = 1.0
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer test data recall() = 1.0
No examples with predicted label !
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer test data F1() = 1.0
Trial 0 Trainer NaiveBayesTrainer test data accuracy = NaN
NaiveBayesTrainer
Summary. train accuracy mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test accuracy mean = NaN stddev = NaN stderr = NaN
Summary. test precision() mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test recall() mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test f1() mean = 1.0 stddev = 0.0 stderr = 0.0
//command 3, estimate classes of the three files about cat, deer and dog. The cat file is the same as the one for cat.mallet
C:\Users\toshiba>mallet classify-dir --input E:\Mallet\testmaterial\test_cat_dir
--output - --classifier E:\Mallet\testmaterial\catClass.classifier
//command 3 output
file:/E:/Mallet/testmaterial/test_cat_dir/catmaterial.txt 1.0
file:/E:/Mallet/testmaterial/test_cat_dir/deertext.txt 1.0
file:/E:/Mallet/testmaterial/test_cat_dir/dogmaterial.txt 1.0
// why the three classes are all 1.0 ?
C:\Users\toshiba>
你能帮忙吗? 谢谢
+++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++
更新:
感谢您的回答,但仍然为所有文件输出 1.0。
我的想法是,我将一些狗文件放在 dog 目录中,并将这些狗文件作为实例,训练模型,然后测试 test_dir 中的一些文件以查看结果。
根据我对你的建议的理解,我尝试了,但仍然输出相同的 1.0。
你能用下面的命令行帮助我吗?
在E:\Mallet\train_dir\dog,有4个dog txt文件(dog 2.txt,dog4.txt,dog5.txt,dogmaterial.txt).
在E:\Mallet\test_dir,有9个txt文件(cat2.txt,catmaterial.txt,deermaterial.txt,dog3.txt,dog6.txt , 狗 2.txt, dog4.txt, dog5.txt, dogmaterial.txt).
C:\Users\toshiba>mallet import-dir --input E:\Mallet\train_dir\dog --output E:\M
allet\classifier_diranimal.mallet --remove-stopwords
Labels =
E:\Mallet\train_dir\dog
C:\Users\toshiba>mallet train-classifier --input E:\Mallet\classifier_diranima
l.mallet --trainer NaiveBayes --output-classifier E:\Mallet\classifier_diranim
alClass.classifier
Training portion = 1.0
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.0
-------------------- Trial 0 --------------------
Trial 0 Training NaiveBayesTrainer with 4 instances
Trial 0 Training NaiveBayesTrainer finished
No examples with predicted label !
No examples with true label !
No examples with predicted label !
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer training data accuracy = 1.0
Trial 0 Trainer NaiveBayesTrainer Test Data Confusion Matrix
No examples with predicted label !
Trial 0 Trainer NaiveBayesTrainer test data precision() = 1.0
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer test data recall() = 1.0
No examples with predicted label !
No examples with true label !
Trial 0 Trainer NaiveBayesTrainer test data F1() = 1.0
Trial 0 Trainer NaiveBayesTrainer test data accuracy = NaN
NaiveBayesTrainer
Summary. train accuracy mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test accuracy mean = NaN stddev = NaN stderr = NaN
Summary. test precision() mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test recall() mean = 1.0 stddev = 0.0 stderr = 0.0
Summary. test f1() mean = 1.0 stddev = 0.0 stderr = 0.0
C:\Users\toshiba>mallet classify-dir --input E:\Mallet\test_dir --output - --cla
ssifier E:\Mallet\classifier_diranimalClass.classifier
file:/E:/Mallet/test_dir/cat2.txt 1.0
file:/E:/Mallet/test_dir/catmaterial.txt 1.0
file:/E:/Mallet/test_dir/deertext.txt 1.0
file:/E:/Mallet/test_dir/dog%202.txt 1.0
file:/E:/Mallet/test_dir/dog3.txt 1.0
file:/E:/Mallet/test_dir/dog4.txt 1.0
file:/E:/Mallet/test_dir/dog5.txt 1.0
file:/E:/Mallet/test_dir/dog6.txt 1.0
file:/E:/Mallet/test_dir/dogmaterial.txt 1.0
C:\Users\toshiba>
谢谢。
有两个输入选项。 input-dir
将目录视为 classes,并将每个目录中的每个文件视为一个输入实例。 input-file
逐行读取输入文件,并将行内的各个字段视为标签和实例数据。您正在使用 files-in-directories 输入类型,因此您正在创建一个具有一个 class 和一个实例的 classifier。我猜你想要 lines-in-file 类型。