scikit-learn fit() 在规范化数据后导致错误
sckit-learn fit() leads to error after normalising the data
我一直在尝试这个:
- 根据数据集创建 X 特征和 y 特征
- 拆分数据集
- 标准化数据
- 使用 Scikit-learn 中的 SVR 进行训练
下面是使用 pandas 数据框填充随机值的代码
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(20,5), columns=["A","B","C","D", "E"])
a = list(df.columns.values)
a.remove("A")
X = df[a]
y = df["A"]
X_train = X.iloc[0: floor(2 * len(X) /3)]
X_test = X.iloc[floor(2 * len(X) /3):]
y_train = y.iloc[0: floor(2 * len(y) /3)]
y_test = y.iloc[floor(2 * len(y) /3):]
# normalise
from sklearn import preprocessing
X_trainS = preprocessing.scale(X_train)
X_trainN = pd.DataFrame(X_trainS, columns=a)
X_testS = preprocessing.scale(X_test)
X_testN = pd.DataFrame(X_testS, columns=a)
y_trainS = preprocessing.scale(y_train)
y_trainN = pd.DataFrame(y_trainS)
y_testS = preprocessing.scale(y_test)
y_testN = pd.DataFrame(y_testS)
import sklearn
from sklearn.svm import SVR
clf = SVR(kernel='rbf', C=1e3, gamma=0.1)
pred = clf.fit(X_trainN,y_trainN).predict(X_testN)
出现此错误:
C:\Anaconda3\lib\site-packages\pandas\core\index.py:542:
FutureWarning: slice indexers when using iloc should be integers and
not floating point "and not floating point",FutureWarning)
--------------------------------------------------------------------------- ValueError Traceback (most recent call
last) in ()
34 clf = SVR(kernel='rbf', C=1e3, gamma=0.1)
35
---> 36 pred = clf.fit(X_trainN,y_trainN).predict(X_testN)
37
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py in fit(self, X, y,
sample_weight)
174
175 seed = rnd.randint(np.iinfo('i').max)
--> 176 fit(X, y, sample_weight, solver_type, kernel, random_seed=seed)
177 # see comment on the other call to np.iinfo in this file
178
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py in _dense_fit(self,
X, y, sample_weight, solver_type, kernel, random_seed)
229 cache_size=self.cache_size, coef0=self.coef0,
230 gamma=self._gamma, epsilon=self.epsilon,
--> 231 max_iter=self.max_iter, random_seed=random_seed)
232
233 self._warn_from_fit_status()
C:\Anaconda3\lib\site-packages\sklearn\svm\libsvm.pyd in
sklearn.svm.libsvm.fit (sklearn\svm\libsvm.c:1864)()
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
我不知道为什么。谁能解释一下?我认为这与预处理后转换回数据帧有关。
此处的错误在您作为标签传递的 df 中:y_trainN
如果您与 sample docs 版本和您的代码进行比较:
In [40]:
n_samples, n_features = 10, 5
np.random.seed(0)
y = np.random.randn(n_samples)
print(y)
y_trainN.values
[ 1.76405235 0.40015721 0.97873798 2.2408932 1.86755799 -0.97727788
0.95008842 -0.15135721 -0.10321885 0.4105985 ]
Out[40]:
array([[-0.06680594],
[ 0.23535043],
[-1.49265082],
[ 1.22537862],
[-0.46499134],
[-0.23744759],
[ 1.40520679],
[ 0.95882677],
[ 1.66996413],
[-0.37515955],
[-0.75826444],
[-1.45945337],
[-0.63995369]])
因此您可以调用 squeeze
来生成一个系列,或者 select df 中的唯一列以便没有错误:
pred = clf.fit(X_trainN,y_trainN[0]).predict(X_testN)
或
pred = clf.fit(X_trainN,y_trainN.squeeze()).predict(X_testN)
所以我们可以争辩说,对于只有一个列的 df,它应该 return 然后可以强制转换为 numpy 数组的东西,或者 numpy 没有正确调用数组属性,但实际上你应该通过一个系列或 select 来自 df 的列作为参数
我一直在尝试这个:
- 根据数据集创建 X 特征和 y 特征
- 拆分数据集
- 标准化数据
- 使用 Scikit-learn 中的 SVR 进行训练
下面是使用 pandas 数据框填充随机值的代码
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(20,5), columns=["A","B","C","D", "E"])
a = list(df.columns.values)
a.remove("A")
X = df[a]
y = df["A"]
X_train = X.iloc[0: floor(2 * len(X) /3)]
X_test = X.iloc[floor(2 * len(X) /3):]
y_train = y.iloc[0: floor(2 * len(y) /3)]
y_test = y.iloc[floor(2 * len(y) /3):]
# normalise
from sklearn import preprocessing
X_trainS = preprocessing.scale(X_train)
X_trainN = pd.DataFrame(X_trainS, columns=a)
X_testS = preprocessing.scale(X_test)
X_testN = pd.DataFrame(X_testS, columns=a)
y_trainS = preprocessing.scale(y_train)
y_trainN = pd.DataFrame(y_trainS)
y_testS = preprocessing.scale(y_test)
y_testN = pd.DataFrame(y_testS)
import sklearn
from sklearn.svm import SVR
clf = SVR(kernel='rbf', C=1e3, gamma=0.1)
pred = clf.fit(X_trainN,y_trainN).predict(X_testN)
出现此错误:
C:\Anaconda3\lib\site-packages\pandas\core\index.py:542: FutureWarning: slice indexers when using iloc should be integers and not floating point "and not floating point",FutureWarning) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 34 clf = SVR(kernel='rbf', C=1e3, gamma=0.1) 35 ---> 36 pred = clf.fit(X_trainN,y_trainN).predict(X_testN) 37
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py in fit(self, X, y, sample_weight) 174 175 seed = rnd.randint(np.iinfo('i').max) --> 176 fit(X, y, sample_weight, solver_type, kernel, random_seed=seed) 177 # see comment on the other call to np.iinfo in this file 178
C:\Anaconda3\lib\site-packages\sklearn\svm\base.py in _dense_fit(self, X, y, sample_weight, solver_type, kernel, random_seed) 229 cache_size=self.cache_size, coef0=self.coef0, 230 gamma=self._gamma, epsilon=self.epsilon, --> 231 max_iter=self.max_iter, random_seed=random_seed) 232 233 self._warn_from_fit_status()
C:\Anaconda3\lib\site-packages\sklearn\svm\libsvm.pyd in sklearn.svm.libsvm.fit (sklearn\svm\libsvm.c:1864)()
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
我不知道为什么。谁能解释一下?我认为这与预处理后转换回数据帧有关。
此处的错误在您作为标签传递的 df 中:y_trainN
如果您与 sample docs 版本和您的代码进行比较:
In [40]:
n_samples, n_features = 10, 5
np.random.seed(0)
y = np.random.randn(n_samples)
print(y)
y_trainN.values
[ 1.76405235 0.40015721 0.97873798 2.2408932 1.86755799 -0.97727788
0.95008842 -0.15135721 -0.10321885 0.4105985 ]
Out[40]:
array([[-0.06680594],
[ 0.23535043],
[-1.49265082],
[ 1.22537862],
[-0.46499134],
[-0.23744759],
[ 1.40520679],
[ 0.95882677],
[ 1.66996413],
[-0.37515955],
[-0.75826444],
[-1.45945337],
[-0.63995369]])
因此您可以调用 squeeze
来生成一个系列,或者 select df 中的唯一列以便没有错误:
pred = clf.fit(X_trainN,y_trainN[0]).predict(X_testN)
或
pred = clf.fit(X_trainN,y_trainN.squeeze()).predict(X_testN)
所以我们可以争辩说,对于只有一个列的 df,它应该 return 然后可以强制转换为 numpy 数组的东西,或者 numpy 没有正确调用数组属性,但实际上你应该通过一个系列或 select 来自 df 的列作为参数