python sklearn cross_validation /标签数与样本数不匹配
python sklearn cross_validation /number of labels does not match number of samples
正在上机器学习课程,我想将数据拆分为训练集和测试集。我想把它拆开,在上面用Decisiontree训练,然后打印出我的测试集的分数。给出了我代码中的交叉验证参数。有人看到我做错了什么吗?
我得到的错误如下:
Traceback (most recent call last):
File "/home/stephan/ud120-projects/validation/validate_poi.py", line 36, in <module>
clf = clf.fit(features_train, labels_train)
File "/home/stephan/.local/lib/python2.7/site-packages/sklearn/tree/tree.py", line 221, in fit
"number of samples=%d" % (len(y), n_samples))
ValueError: Number of labels=29 does not match number of samples=66
这是我的代码:
import pickle
import sys
sys.path.append("../tools/")
from feature_format import featureFormat, targetFeatureSplit
data_dict = pickle.load(open("../final_project/final_project_dataset.pkl", "r") )
features_list = ["poi", "salary"]
data = featureFormat(data_dict, features_list)
labels, features = targetFeatureSplit(data)
from sklearn import tree
from sklearn import cross_validation
features_train, labels_train, features_test, labels_test = \
cross_validation.train_test_split(features, labels, random_state=42, test_size=0.3)
clf = tree.DecisionTreeClassifier()
clf = clf.fit(features_train, labels_train)
print clf.score(features_test, labels_test)
您的变量似乎与 train_test_split
的 return 模式不匹配
尝试:
features_train, features_test, labels_train, labels_test = ...
你需要在train_test_split函数中传递test_size = 0.5
train_test_split(...,test_size=0.5,...)
正在上机器学习课程,我想将数据拆分为训练集和测试集。我想把它拆开,在上面用Decisiontree训练,然后打印出我的测试集的分数。给出了我代码中的交叉验证参数。有人看到我做错了什么吗?
我得到的错误如下:
Traceback (most recent call last):
File "/home/stephan/ud120-projects/validation/validate_poi.py", line 36, in <module>
clf = clf.fit(features_train, labels_train)
File "/home/stephan/.local/lib/python2.7/site-packages/sklearn/tree/tree.py", line 221, in fit
"number of samples=%d" % (len(y), n_samples))
ValueError: Number of labels=29 does not match number of samples=66
这是我的代码:
import pickle
import sys
sys.path.append("../tools/")
from feature_format import featureFormat, targetFeatureSplit
data_dict = pickle.load(open("../final_project/final_project_dataset.pkl", "r") )
features_list = ["poi", "salary"]
data = featureFormat(data_dict, features_list)
labels, features = targetFeatureSplit(data)
from sklearn import tree
from sklearn import cross_validation
features_train, labels_train, features_test, labels_test = \
cross_validation.train_test_split(features, labels, random_state=42, test_size=0.3)
clf = tree.DecisionTreeClassifier()
clf = clf.fit(features_train, labels_train)
print clf.score(features_test, labels_test)
您的变量似乎与 train_test_split
尝试:
features_train, features_test, labels_train, labels_test = ...
你需要在train_test_split函数中传递test_size = 0.5
train_test_split(...,test_size=0.5,...)