将 OneClassSVM 与 cross_val_score 结合使用
Use of OneClassSVM with cross_val_score
我想使用 cross_val_score 来验证我的 OneClassSVM 训练集。这样做会导致出现以下错误消息。
会不会因为OneClassSVM是无监督算法,没有"y"向量传递给cross_val_score,所以算法失败了?
clf = svm.OneClassSVM(nu=_nu, kernel=_kernel, gamma=_gamma, random_state=_random_state, cache_size=_cache_size)
scores = cross_val_score(estimator=clf, X=X_scaled, scoring='accuracy', cv=5)
PS:我意识到 "y" 向量在 cross_val_score 中是可选的。但是,这个错误仍然让我假设 "y" 向量导致了错误。
File "/usr/local/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 140, in cross_val_score
for train, test in cv_iter)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 758, in __call__
while self.dispatch_one_batch(iterator):
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 608, in dispatch_one_batch
self._dispatch(tasks)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 571, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 109, in apply_async
result = ImmediateResult(func)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 326, in __init__
self.results = batch()
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/usr/local/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 260, in _fit_and_score
test_score = _score(estimator, X_test, y_test, scorer)
File "/usr/local/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 286, in _score
score = scorer(estimator, X_test)
TypeError: __call__() takes at least 4 arguments (3 given)
我假设您将 OneClassSVM 用于离群值检测原因(它是在 scikit 中实现的,而不是用于 class化任务)
documentation of cross_val_score 说 y
:
y : array-like, optional, default: None
The target variable to try to predict in the case of supervised learning.
查看那里的“监督学习”。
所以当你这样做时:
clf = svm.OneClassSVM(nu=_nu, kernel=_kernel, gamma=_gamma,
random_state=_random_state, cache_size=_cache_size)
scores = cross_val_score(estimator=clf, X=X_scaled, scoring='accuracy', cv=5)
你的假设是正确的 OneClassSVM
是一个无监督模型,它不需要 y
参数。到现在都还好。
但是您还将 scoring
参数设置为“准确度”。这就是错误的来源。当您使用字符串“accuracy”时,将使用默认的 [accuracy_score
] (http://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) 指标,其签名为:
accuracy_score(y_true, y_pred, ... ...)
那里需要实际和预测的 y
(不是可选的),这会强制 cross_val_score
检查是否提供了 y
,因此会出现错误。
希望你明白我的意思。
解法:
如 this answer here 中所述,“在 one-class SVM 中,准确性的概念不合适。”但是,如果您仍然打算使用“准确性”,那么您需要为提供的数据准备好基本事实 y
。基本上 y
应该由 +1 或 -1 组成,具体取决于实际样本是异常值还是异常值。
为什么我使用 +1 和 -1 是因为,OneClassSVM.predict() 将 return 这样的值:
predict(X)
Perform regression on samples in X. For an one-class model, +1 or -1 is returned.
否则您需要找到任何其他评分指标,它可以为您的预测 X(没有实际的基本事实 y)提供一些有意义的分数,或者设计您自己的评分方法来计算数据的异常值检测。
如果需要更多帮助,请随时询问。
我想使用 cross_val_score 来验证我的 OneClassSVM 训练集。这样做会导致出现以下错误消息。
会不会因为OneClassSVM是无监督算法,没有"y"向量传递给cross_val_score,所以算法失败了?
clf = svm.OneClassSVM(nu=_nu, kernel=_kernel, gamma=_gamma, random_state=_random_state, cache_size=_cache_size) scores = cross_val_score(estimator=clf, X=X_scaled, scoring='accuracy', cv=5)
PS:我意识到 "y" 向量在 cross_val_score 中是可选的。但是,这个错误仍然让我假设 "y" 向量导致了错误。
File "/usr/local/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 140, in cross_val_score
for train, test in cv_iter)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 758, in __call__
while self.dispatch_one_batch(iterator):
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 608, in dispatch_one_batch
self._dispatch(tasks)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 571, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 109, in apply_async
result = ImmediateResult(func)
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 326, in __init__
self.results = batch()
File "/usr/local/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 131, in __call__
return [func(*args, **kwargs) for func, args, kwargs in self.items]
File "/usr/local/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 260, in _fit_and_score
test_score = _score(estimator, X_test, y_test, scorer)
File "/usr/local/lib/python2.7/site-packages/sklearn/model_selection/_validation.py", line 286, in _score
score = scorer(estimator, X_test)
TypeError: __call__() takes at least 4 arguments (3 given)
我假设您将 OneClassSVM 用于离群值检测原因(它是在 scikit 中实现的,而不是用于 class化任务)
documentation of cross_val_score 说 y
:
y : array-like, optional, default: None
The target variable to try to predict in the case of supervised learning.
查看那里的“监督学习”。
所以当你这样做时:
clf = svm.OneClassSVM(nu=_nu, kernel=_kernel, gamma=_gamma,
random_state=_random_state, cache_size=_cache_size)
scores = cross_val_score(estimator=clf, X=X_scaled, scoring='accuracy', cv=5)
你的假设是正确的 OneClassSVM
是一个无监督模型,它不需要 y
参数。到现在都还好。
但是您还将 scoring
参数设置为“准确度”。这就是错误的来源。当您使用字符串“accuracy”时,将使用默认的 [accuracy_score
] (http://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) 指标,其签名为:
accuracy_score(y_true, y_pred, ... ...)
那里需要实际和预测的 y
(不是可选的),这会强制 cross_val_score
检查是否提供了 y
,因此会出现错误。
希望你明白我的意思。
解法:
如 this answer here 中所述,“在 one-class SVM 中,准确性的概念不合适。”但是,如果您仍然打算使用“准确性”,那么您需要为提供的数据准备好基本事实 y
。基本上 y
应该由 +1 或 -1 组成,具体取决于实际样本是异常值还是异常值。
为什么我使用 +1 和 -1 是因为,OneClassSVM.predict() 将 return 这样的值:
predict(X)
Perform regression on samples in X. For an one-class model, +1 or -1 is returned.
否则您需要找到任何其他评分指标,它可以为您的预测 X(没有实际的基本事实 y)提供一些有意义的分数,或者设计您自己的评分方法来计算数据的异常值检测。
如果需要更多帮助,请随时询问。