使用 skorch 和 sklearn 管道的多输出回归给出了由于 dtype 而导致的运行时错误
Multi-output regression using skorch & sklearn pipeline gives runtime error due to dtype
我想用skorch做多输出回归。我创建了一个小玩具示例,如下所示。在示例中,NN 应预测 5 个输出。我还想使用使用 sklearn 管道合并的预处理步骤(在此示例中使用 PCA,但它可以是任何其他预处理器)。执行此示例时,在火炬的 Variable._execution_engine.run_backward 步骤中出现以下错误:
RuntimeError: Found dtype Double but expected Float
我是不是忘记了什么?我怀疑,必须在某个地方投射一些东西,但是由于 skorch 处理了很多 pytorch 的东西,我不知道是什么和在哪里。
示例:
import torch
import skorch
from sklearn.datasets import make_classification, make_regression
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.decomposition import PCA
X, y = make_regression(n_samples=1000, n_features=40, n_targets=5)
X = X.astype('float32')
class RegressionModule(torch.nn.Module):
def __init__(self, input_dim=80):
super().__init__()
self.l0 = torch.nn.Linear(input_dim, 10)
self.l1 = torch.nn.Linear(10, 5)
def forward(self, X):
y = self.l0(X)
y = self.l1(y)
return y
class InputShapeSetter(skorch.callbacks.Callback):
def on_train_begin(self, net, X, y):
net.set_params(module__input_dim=X.shape[-1])
net = skorch.NeuralNetRegressor(
RegressionModule,
callbacks=[InputShapeSetter()],
)
pipe = make_pipeline(PCA(n_components=10), net)
pipe.fit(X, y)
print(pipe.predict(X))
编辑 1:
从这个例子可以看出,在开始时将 X 转换为 float32 对每个预处理器都不起作用:
import torch
import skorch
from sklearn.datasets import make_classification, make_regression
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from category_encoders import OneHotEncoder
X, y = make_regression(n_samples=1000, n_features=40, n_targets=5)
X = pd.DataFrame(X,columns=[f'feature_{i}' for i in range(X.shape[1])])
X['feature_1'] = pd.qcut(X['feature_1'], 3, labels=["good", "medium", "bad"])
y = y.astype('float32')
class RegressionModule(torch.nn.Module):
def __init__(self, input_dim=80):
super().__init__()
self.l0 = torch.nn.Linear(input_dim, 10)
self.l1 = torch.nn.Linear(10, 5)
def forward(self, X):
y = self.l0(X)
y = self.l1(y)
return y
class InputShapeSetter(skorch.callbacks.Callback):
def on_train_begin(self, net, X, y):
net.set_params(module__input_dim=X.shape[-1])
net = skorch.NeuralNetRegressor(
RegressionModule,
callbacks=[InputShapeSetter()],
)
pipe = make_pipeline(OneHotEncoder(cols=['feature_1'], return_df=False), net)
pipe.fit(X, y)
print(pipe.predict(X))
默认 OneHotEncoder
returns dtype=float64
的 numpy 数组。因此,当被输入模型的 forward()
时,可以简单地转换输入数据 X
:
class RegressionModule(torch.nn.Module):
def __init__(self, input_dim=80):
super().__init__()
self.l0 = torch.nn.Linear(input_dim, 10)
self.l1 = torch.nn.Linear(10, 5)
def forward(self, X):
X = X.to(torch.float32)
y = self.l0(X)
y = self.l1(y)
return y
我想用skorch做多输出回归。我创建了一个小玩具示例,如下所示。在示例中,NN 应预测 5 个输出。我还想使用使用 sklearn 管道合并的预处理步骤(在此示例中使用 PCA,但它可以是任何其他预处理器)。执行此示例时,在火炬的 Variable._execution_engine.run_backward 步骤中出现以下错误:
RuntimeError: Found dtype Double but expected Float
我是不是忘记了什么?我怀疑,必须在某个地方投射一些东西,但是由于 skorch 处理了很多 pytorch 的东西,我不知道是什么和在哪里。
示例:
import torch
import skorch
from sklearn.datasets import make_classification, make_regression
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.decomposition import PCA
X, y = make_regression(n_samples=1000, n_features=40, n_targets=5)
X = X.astype('float32')
class RegressionModule(torch.nn.Module):
def __init__(self, input_dim=80):
super().__init__()
self.l0 = torch.nn.Linear(input_dim, 10)
self.l1 = torch.nn.Linear(10, 5)
def forward(self, X):
y = self.l0(X)
y = self.l1(y)
return y
class InputShapeSetter(skorch.callbacks.Callback):
def on_train_begin(self, net, X, y):
net.set_params(module__input_dim=X.shape[-1])
net = skorch.NeuralNetRegressor(
RegressionModule,
callbacks=[InputShapeSetter()],
)
pipe = make_pipeline(PCA(n_components=10), net)
pipe.fit(X, y)
print(pipe.predict(X))
编辑 1:
从这个例子可以看出,在开始时将 X 转换为 float32 对每个预处理器都不起作用:
import torch
import skorch
from sklearn.datasets import make_classification, make_regression
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from category_encoders import OneHotEncoder
X, y = make_regression(n_samples=1000, n_features=40, n_targets=5)
X = pd.DataFrame(X,columns=[f'feature_{i}' for i in range(X.shape[1])])
X['feature_1'] = pd.qcut(X['feature_1'], 3, labels=["good", "medium", "bad"])
y = y.astype('float32')
class RegressionModule(torch.nn.Module):
def __init__(self, input_dim=80):
super().__init__()
self.l0 = torch.nn.Linear(input_dim, 10)
self.l1 = torch.nn.Linear(10, 5)
def forward(self, X):
y = self.l0(X)
y = self.l1(y)
return y
class InputShapeSetter(skorch.callbacks.Callback):
def on_train_begin(self, net, X, y):
net.set_params(module__input_dim=X.shape[-1])
net = skorch.NeuralNetRegressor(
RegressionModule,
callbacks=[InputShapeSetter()],
)
pipe = make_pipeline(OneHotEncoder(cols=['feature_1'], return_df=False), net)
pipe.fit(X, y)
print(pipe.predict(X))
默认 OneHotEncoder
returns dtype=float64
的 numpy 数组。因此,当被输入模型的 forward()
时,可以简单地转换输入数据 X
:
class RegressionModule(torch.nn.Module):
def __init__(self, input_dim=80):
super().__init__()
self.l0 = torch.nn.Linear(input_dim, 10)
self.l1 = torch.nn.Linear(10, 5)
def forward(self, X):
X = X.to(torch.float32)
y = self.l0(X)
y = self.l1(y)
return y