MLPRegressor 中的稀疏矩阵错误
Sparse Matrix error in MLPRegressor
上下文
我在尝试使用稀疏矩阵作为 sklearn.neural_network.MLPRegressor
的输入时遇到了一个错误 运行。名义上,此方法能够处理稀疏矩阵。我认为这可能是 scikit-learn
中的错误,但我想在提交问题之前检查一下。
问题
将 scipy.sparse
输入传递给 sklearn.neural_network.MLPRegressor
时,我得到:
ValueError: input must be a square array
错误是由 numpy.matrixlab.defmatrix
中的 matrix_power
函数引发的。这似乎是因为 matrix_power
将稀疏矩阵传递给 numpy.asanyarray
(L137), which returns an array of size=1, ndim=0 containing the sparse matrix object. matrix_power
then performs some dimension checks (L138-141) 以确保输入是方阵,但失败是因为 numpy.asanyarray
返回的数组是不是正方形,即使基础稀疏矩阵 是 正方形。
据我所知,问题源于 numpy.asanyarray
阻止确定稀疏矩阵的维数。稀疏矩阵本身有一个大小属性,允许它通过维度检查,但前提是它不是 运行 通过 asanyarray
.
我 认为 这可能是一个错误,但在我确认我不仅仅是个白痴之前不想深入讨论归档问题!请看下面,检查一下。
如果它是一个错误,那么在哪里提出问题最合适?麻麻? SciPy?还是 Scikit-Learn?
最小示例
环境
Arch Linux
kernel 4.15.7-1
Python 3.6.4
numpy 1.14.1
scipy 1.0.0
sklearn 0.19.1
代码
import numpy as np
from scipy import sparse
from sklearn import model_selection
from sklearn.preprocessing import StandardScaler, Imputer
from sklearn.neural_network import MLPRegressor
## Generate some synthetic data
def fW(A, B, C):
return A * np.random.normal(.3, .1) + B * np.random.normal(.6, .1)
def fX(A, B, C):
return B * np.random.normal(-1, .1) + A * np.random.normal(-.9, .1) / C
# independent variables
N = int(1e4)
A = np.random.uniform(2, 12, N)
B = np.random.uniform(2, 12, N)
C = np.random.uniform(2, 12, N)
# synthetic data
mW = fW(A, B, C)
mX = fX(A, B, C)
# combine datasets
real = np.vstack([A, B, C]).T
meas = np.vstack([mW, mX]).T
# add noise to meas
meas *= np.random.normal(1, 0.0001, meas.shape)
## Make data sparse
prob_null = 0.2
real[np.random.choice([True, False], real.shape, p=[prob_null, 1-prob_null])] = np.nan
meas[np.random.choice([True, False], meas.shape, p=[prob_null, 1-prob_null])] = np.nan
# NB: problem persists whichever sparse matrix method is used.
real = sparse.csr_matrix(real)
meas = sparse.csr_matrix(meas)
# replace missing values with mean
rmnan = Imputer()
real = rmnan.fit_transform(real)
meas = rmnan.fit_transform(meas)
# split into test/training sets
real_train, real_test, meas_train, meas_test = model_selection.train_test_split(real, meas, test_size=0.3)
# create scalers and apply to data
real_scaler = StandardScaler(with_mean=False)
meas_scaler = StandardScaler(with_mean=False)
real_scaler.fit(real_train)
meas_scaler.fit(meas_train)
treal_train = real_scaler.transform(real_train)
tmeas_train = meas_scaler.transform(meas_train)
treal_test = real_scaler.transform(real_test)
tmeas_test = meas_scaler.transform(meas_test)
nn = MLPRegressor((100,100,10), solver='lbfgs', early_stopping=True, activation='tanh')
nn.fit(tmeas_train, treal_train)
## ERROR RAISED HERE
## The problem:
# the sparse matrix has a shape attribute that would pass the square matrix validation
tmeas_train.shape
# but not after it's been through asanyarray
np.asanyarray(tmeas_train).shape
MLPRegressor.fit() 因为 given in documentation 支持 X
的稀疏矩阵,但不支持 y
Parameters:
X : array-like or sparse matrix, shape (n_samples, n_features)
The input data.
y : array-like, shape (n_samples,) or (n_samples, n_outputs)
The target values (class labels in classification, real numbers in regression).
我能够成功 运行 您的代码:
nn.fit(tmeas_train, treal_train.toarray())
上下文
我在尝试使用稀疏矩阵作为 sklearn.neural_network.MLPRegressor
的输入时遇到了一个错误 运行。名义上,此方法能够处理稀疏矩阵。我认为这可能是 scikit-learn
中的错误,但我想在提交问题之前检查一下。
问题
将 scipy.sparse
输入传递给 sklearn.neural_network.MLPRegressor
时,我得到:
ValueError: input must be a square array
错误是由 numpy.matrixlab.defmatrix
中的 matrix_power
函数引发的。这似乎是因为 matrix_power
将稀疏矩阵传递给 numpy.asanyarray
(L137), which returns an array of size=1, ndim=0 containing the sparse matrix object. matrix_power
then performs some dimension checks (L138-141) 以确保输入是方阵,但失败是因为 numpy.asanyarray
返回的数组是不是正方形,即使基础稀疏矩阵 是 正方形。
据我所知,问题源于 numpy.asanyarray
阻止确定稀疏矩阵的维数。稀疏矩阵本身有一个大小属性,允许它通过维度检查,但前提是它不是 运行 通过 asanyarray
.
我 认为 这可能是一个错误,但在我确认我不仅仅是个白痴之前不想深入讨论归档问题!请看下面,检查一下。
如果它是一个错误,那么在哪里提出问题最合适?麻麻? SciPy?还是 Scikit-Learn?
最小示例
环境
Arch Linux
kernel 4.15.7-1
Python 3.6.4
numpy 1.14.1
scipy 1.0.0
sklearn 0.19.1
代码
import numpy as np
from scipy import sparse
from sklearn import model_selection
from sklearn.preprocessing import StandardScaler, Imputer
from sklearn.neural_network import MLPRegressor
## Generate some synthetic data
def fW(A, B, C):
return A * np.random.normal(.3, .1) + B * np.random.normal(.6, .1)
def fX(A, B, C):
return B * np.random.normal(-1, .1) + A * np.random.normal(-.9, .1) / C
# independent variables
N = int(1e4)
A = np.random.uniform(2, 12, N)
B = np.random.uniform(2, 12, N)
C = np.random.uniform(2, 12, N)
# synthetic data
mW = fW(A, B, C)
mX = fX(A, B, C)
# combine datasets
real = np.vstack([A, B, C]).T
meas = np.vstack([mW, mX]).T
# add noise to meas
meas *= np.random.normal(1, 0.0001, meas.shape)
## Make data sparse
prob_null = 0.2
real[np.random.choice([True, False], real.shape, p=[prob_null, 1-prob_null])] = np.nan
meas[np.random.choice([True, False], meas.shape, p=[prob_null, 1-prob_null])] = np.nan
# NB: problem persists whichever sparse matrix method is used.
real = sparse.csr_matrix(real)
meas = sparse.csr_matrix(meas)
# replace missing values with mean
rmnan = Imputer()
real = rmnan.fit_transform(real)
meas = rmnan.fit_transform(meas)
# split into test/training sets
real_train, real_test, meas_train, meas_test = model_selection.train_test_split(real, meas, test_size=0.3)
# create scalers and apply to data
real_scaler = StandardScaler(with_mean=False)
meas_scaler = StandardScaler(with_mean=False)
real_scaler.fit(real_train)
meas_scaler.fit(meas_train)
treal_train = real_scaler.transform(real_train)
tmeas_train = meas_scaler.transform(meas_train)
treal_test = real_scaler.transform(real_test)
tmeas_test = meas_scaler.transform(meas_test)
nn = MLPRegressor((100,100,10), solver='lbfgs', early_stopping=True, activation='tanh')
nn.fit(tmeas_train, treal_train)
## ERROR RAISED HERE
## The problem:
# the sparse matrix has a shape attribute that would pass the square matrix validation
tmeas_train.shape
# but not after it's been through asanyarray
np.asanyarray(tmeas_train).shape
MLPRegressor.fit() 因为 given in documentation 支持 X
的稀疏矩阵,但不支持 y
Parameters:
X : array-like or sparse matrix, shape (n_samples, n_features)
The input data.
y : array-like, shape (n_samples,) or (n_samples, n_outputs)
The target values (class labels in classification, real numbers in regression).
我能够成功 运行 您的代码:
nn.fit(tmeas_train, treal_train.toarray())