Scikit-learn GridSearchCV - 为什么我在执行 grid.fit() 时收到数据类型错误?
Scikit-learn GridSearchCV - Why am I receiving a data type error when I execute grid.fit()?
我一直在 python 从事机器学习项目。在获得基本的神经网络 运行 之后,我正在尝试设置网格搜索以使用 sklearn
中的 GridSearchCV
函数优化参数。 grid.fit(X,Y)
函数抛出此错误:TypeError: only size-1 arrays can be converted to Python scalars
。我的解释是 fit 函数不喜欢我给它的 X
和 Y
的格式。这让我感到困惑,因为没有网格搜索,网络 运行 很好,而且我根本没有弄乱网络或数据。谁能解释这里发生了什么以及我该如何解决它?
此代码创建网络和网格搜索:
#Creating the neural network
def create_model():
model=Sequential()
model.add(Dense(512, activation='relu',input_shape=(2606,)))
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='relu'))
opt=optimizers.Adam(lr=learn_rate)
model.compile(optimizer=opt, loss='mean_squared_error', metrics=['accuracy'])
#I commented this out because I believe it is delegated to the grid.fit() fn later on.
#model.fit(X_train, Y_train, batch_size=30, epochs=6000, verbose=1)
return model
#Now setting up the grid search
model=KerasClassifier(build_fn=create_model())
learn_rate=np.arange(.00001,.001,.00002).tolist()
batch_size=np.arange(10,2606,2).tolist()
epochs=np.arange(1000,10000,100).tolist()
param_grid=dict(learn_rate=learn_rate, batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_results=grid.fit(X_train,Y_train) #This is the line referenced in the error message.
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
如有任何建议,我们将不胜感激!
编辑:
X_train
数据的形状为 (167,2606)
。 167 个元素中的每一个都是一个长度为 2606 的数组。这就是为什么网络的 input_shape
是 (2606,)
的原因。 Y_train
的形状为 (167,)
.
所以,问题是 GridSearchCV
为每个组合创建了一个新参数的新模型。您正在传递一个已经创建的模型和一个参数列表。我相信这是数组与标量错误的来源。下面,我更改了您的代码(使用一些垃圾样本数据),将 运行.
要注意的主要变化是我更改了您的 create_model
签名以接受您传递到 GridSearch 的参数值。我还删除了您对 KerasClassifier
实例对变量 model
的分配,而是将该调用作为估算器放在 GridSearchCV
.
中
import numpy as np
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import optimizers
from sklearn.model_selection import GridSearchCV
#Creating the neural network
def create_model(learn_rate, batch_size, epochs):
model=Sequential()
model.add(Dense(512, activation='relu',input_shape=(2606,)))
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='relu'))
opt=optimizers.Adam(lr=learn_rate)
model.compile(optimizer=opt, loss='mean_squared_error', metrics=['accuracy'])
#I commented this out because I believe it is delegated to the grid.fit() fn later on.
#model.fit(X_train, Y_train, batch_size=30, epochs=6000, verbose=1)
return model
#Now setting up the grid search
X_train = np.empty((167,2606), dtype=float, order='C')
Y_train = np.empty((167,), dtype=float, order='C')
learn_rate=np.arange(.00001,.001,.00002).tolist()
batch_size=np.arange(10,2606,2).tolist()
epochs=np.arange(1000,10000,100).tolist()
param_grid=dict(learn_rate=learn_rate, batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=KerasClassifier(build_fn=create_model),
param_grid=param_grid, n_jobs=-1, cv=3)
grid_results=grid.fit(X_train,Y_train) #This is the line referenced in the error message.
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
我一直在 python 从事机器学习项目。在获得基本的神经网络 运行 之后,我正在尝试设置网格搜索以使用 sklearn
中的 GridSearchCV
函数优化参数。 grid.fit(X,Y)
函数抛出此错误:TypeError: only size-1 arrays can be converted to Python scalars
。我的解释是 fit 函数不喜欢我给它的 X
和 Y
的格式。这让我感到困惑,因为没有网格搜索,网络 运行 很好,而且我根本没有弄乱网络或数据。谁能解释这里发生了什么以及我该如何解决它?
此代码创建网络和网格搜索:
#Creating the neural network
def create_model():
model=Sequential()
model.add(Dense(512, activation='relu',input_shape=(2606,)))
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='relu'))
opt=optimizers.Adam(lr=learn_rate)
model.compile(optimizer=opt, loss='mean_squared_error', metrics=['accuracy'])
#I commented this out because I believe it is delegated to the grid.fit() fn later on.
#model.fit(X_train, Y_train, batch_size=30, epochs=6000, verbose=1)
return model
#Now setting up the grid search
model=KerasClassifier(build_fn=create_model())
learn_rate=np.arange(.00001,.001,.00002).tolist()
batch_size=np.arange(10,2606,2).tolist()
epochs=np.arange(1000,10000,100).tolist()
param_grid=dict(learn_rate=learn_rate, batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_results=grid.fit(X_train,Y_train) #This is the line referenced in the error message.
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
如有任何建议,我们将不胜感激!
编辑:
X_train
数据的形状为 (167,2606)
。 167 个元素中的每一个都是一个长度为 2606 的数组。这就是为什么网络的 input_shape
是 (2606,)
的原因。 Y_train
的形状为 (167,)
.
所以,问题是 GridSearchCV
为每个组合创建了一个新参数的新模型。您正在传递一个已经创建的模型和一个参数列表。我相信这是数组与标量错误的来源。下面,我更改了您的代码(使用一些垃圾样本数据),将 运行.
要注意的主要变化是我更改了您的 create_model
签名以接受您传递到 GridSearch 的参数值。我还删除了您对 KerasClassifier
实例对变量 model
的分配,而是将该调用作为估算器放在 GridSearchCV
.
import numpy as np
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras import optimizers
from sklearn.model_selection import GridSearchCV
#Creating the neural network
def create_model(learn_rate, batch_size, epochs):
model=Sequential()
model.add(Dense(512, activation='relu',input_shape=(2606,)))
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='relu'))
opt=optimizers.Adam(lr=learn_rate)
model.compile(optimizer=opt, loss='mean_squared_error', metrics=['accuracy'])
#I commented this out because I believe it is delegated to the grid.fit() fn later on.
#model.fit(X_train, Y_train, batch_size=30, epochs=6000, verbose=1)
return model
#Now setting up the grid search
X_train = np.empty((167,2606), dtype=float, order='C')
Y_train = np.empty((167,), dtype=float, order='C')
learn_rate=np.arange(.00001,.001,.00002).tolist()
batch_size=np.arange(10,2606,2).tolist()
epochs=np.arange(1000,10000,100).tolist()
param_grid=dict(learn_rate=learn_rate, batch_size=batch_size, epochs=epochs)
grid = GridSearchCV(estimator=KerasClassifier(build_fn=create_model),
param_grid=param_grid, n_jobs=-1, cv=3)
grid_results=grid.fit(X_train,Y_train) #This is the line referenced in the error message.
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))