对多个模型使用装袋算法
using bagging algorithm with multiple models
我正在尝试为 LasVegasTripAdvisorReviews-Dataset 建立模型
使用套袋算法,
我有一个错误(不支持多标签和多输出分类)
你能帮我告诉我如何解决这个错误吗)
问候
附件包含 link 到 lasvegas dataset LasVegasTripAdvisorReviews-Dataset
# Voting Ensemble for Classification
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier,GradientBoostingClassifier,AdaBoostClassifier,RandomForestClassifier
url = "h:/LasVegasTripAdvisorReviews-Dataset.csv"
names = ['User country','Nr. reviews','Nr. hotel reviews','Helpful votes','Period of stay','Traveler type','Pool','Gym','Tennis court','Spa','Casino','Free internet','Hotel name','Hotel stars','Nr. rooms','User continent','Member years','Review month','Review weekday','Score']
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
X = array[:,:]
Y = array[:,:]
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
# create the sub models
estimators = []
model1 = AdaBoostClassifier()
estimators.append(('AdaBoost', model1))
model2 = GradientBoostingClassifier()
estimators.append(('GradientBoosting', model2))
model3 = RandomForestClassifier()
estimators.append(('RandomForest', model3))
# create the ensemble model
ensemble = VotingClassifier(estimators)
results = model_selection.cross_val_score(ensemble, X, Y, cv=kfold)
print(results.mean())
堆栈跟踪:
NotImplementedError Traceback (most recent call last)
<ipython-input-9-bda887b4022f> in <module>
27 # create the ensemble model
28 ensemble = VotingClassifier(estimators)
---> 29 results = model_selection.cross_val_score(ensemble, X, Y, cv=kfold)
30 print(results.mean())
/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, error_score)
400 fit_params=fit_params,
401 pre_dispatch=pre_dispatch,
--> 402 error_score=error_score)
403 return cv_results['test_score']
404
...
...
NotImplementedError: Multilabel and multi-output classification is not supported.
你有这行:
X = array[:,:]
Y = array[:,:]
意味着你的特征矩阵(X)和目标向量(Y)是相同的。
您只需要选择一列作为您的Y。
例如,假设您希望 最后一列 为 Y.
然后,您应该将上面的行更改为:
X = values[:,:-1]
Y = values[:,-1:]
这应该可以解决您遇到的错误。你的错误基本上意味着:I don't support more than one column in Y.
我正在尝试为 LasVegasTripAdvisorReviews-Dataset 建立模型 使用套袋算法, 我有一个错误(不支持多标签和多输出分类) 你能帮我告诉我如何解决这个错误吗)
问候
附件包含 link 到 lasvegas dataset LasVegasTripAdvisorReviews-Dataset
# Voting Ensemble for Classification
import pandas
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier,GradientBoostingClassifier,AdaBoostClassifier,RandomForestClassifier
url = "h:/LasVegasTripAdvisorReviews-Dataset.csv"
names = ['User country','Nr. reviews','Nr. hotel reviews','Helpful votes','Period of stay','Traveler type','Pool','Gym','Tennis court','Spa','Casino','Free internet','Hotel name','Hotel stars','Nr. rooms','User continent','Member years','Review month','Review weekday','Score']
dataframe = pandas.read_csv(url, names=names)
array = dataframe.values
X = array[:,:]
Y = array[:,:]
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
# create the sub models
estimators = []
model1 = AdaBoostClassifier()
estimators.append(('AdaBoost', model1))
model2 = GradientBoostingClassifier()
estimators.append(('GradientBoosting', model2))
model3 = RandomForestClassifier()
estimators.append(('RandomForest', model3))
# create the ensemble model
ensemble = VotingClassifier(estimators)
results = model_selection.cross_val_score(ensemble, X, Y, cv=kfold)
print(results.mean())
堆栈跟踪:
NotImplementedError Traceback (most recent call last)
<ipython-input-9-bda887b4022f> in <module>
27 # create the ensemble model
28 ensemble = VotingClassifier(estimators)
---> 29 results = model_selection.cross_val_score(ensemble, X, Y, cv=kfold)
30 print(results.mean())
/usr/local/lib/python3.5/dist-packages/sklearn/model_selection/_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch, error_score)
400 fit_params=fit_params,
401 pre_dispatch=pre_dispatch,
--> 402 error_score=error_score)
403 return cv_results['test_score']
404
...
...
NotImplementedError: Multilabel and multi-output classification is not supported.
你有这行:
X = array[:,:]
Y = array[:,:]
意味着你的特征矩阵(X)和目标向量(Y)是相同的。
您只需要选择一列作为您的Y。
例如,假设您希望 最后一列 为 Y.
然后,您应该将上面的行更改为:
X = values[:,:-1]
Y = values[:,-1:]
这应该可以解决您遇到的错误。你的错误基本上意味着:I don't support more than one column in Y.