XGBoost:检查失败:有效:输入数据包含“inf”或“nan”
XGBoost: Check failed: valid: Input data contains `inf` or `nan`
我正在尝试 运行 XGBoost on Windows 10。我的代码的相关部分如下所示:
model = XGBClassifier()
print(x_train.shape)
print(y_train.shape)
print(np.isnan(x_train).any())
print(np.isnan(y_train).any())
print(np.isinf(x_train).any())
print(np.isinf(y_train).any())
print(np.isfinite(x_train).all())
print(np.isfinite(y_train).all())
model.fit(x_train, y_train)
并产生以下结果:
(4116, 37)
(4116,)
False
False
False
False
True
True
The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1]. Traceback (most recent call last): [...]
model.fit(x_train, y_train) File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 436, in inner_f
return f(**kwargs) File "D:\Programs\Anaconda\lib\site-packages\xgboost\sklearn.py", line 1173, in fit
label_transform=label_transform, File "D:\Programs\Anaconda\lib\site-packages\xgboost\sklearn.py", line 244, in _wrap_evaluation_matrices
missing=missing, File "D:\Programs\Anaconda\lib\site-packages\xgboost\sklearn.py", line 1172, in <lambda>
create_dmatrix=lambda **kwargs: DMatrix(nthread=self.n_jobs, **kwargs), File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 436, in inner_f
return f(**kwargs) File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 547, in
__init__
enable_categorical=enable_categorical, File "D:\Programs\Anaconda\lib\site-packages\xgboost\data.py", line 565, in dispatch_data_backend
feature_types) File "D:\Programs\Anaconda\lib\site-packages\xgboost\data.py", line 169, in
_from_numpy_array
ctypes.c_int(nthread))) File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 210, in
_check_call
raise XGBoostError(py_str(_LIB.XGBGetLastError())) xgboost.core.XGBoostError: [14:21:29] C:/Users/Administrator/workspace/xgboost-win64_release_1.4.0/src/data/data.cc:945: Check failed: valid: Input data contains `inf` or `nan`
我的数据显然不包含任何“inf”或“nan”值。非常感谢任何关于如何从这里开始的想法。
我刚刚遇到了同样的错误,这似乎是由于存在非常大的浮点数 (1e300) 引起的。我使用对数变换修复了它。
使用 scikit-learn 的 StandardScaler 解决了我的问题。感谢 Antoine Messager 的回答,我最终做了以下事情:
from sklearn.preprocessing import StandardScaler
model = XGBClassifier()
scaler = StandardScaler()
x_trainScaled = scaler.fit_transform(x_train)
model.fit(x_trainScaled, y_train)
我正在尝试 运行 XGBoost on Windows 10。我的代码的相关部分如下所示:
model = XGBClassifier()
print(x_train.shape)
print(y_train.shape)
print(np.isnan(x_train).any())
print(np.isnan(y_train).any())
print(np.isinf(x_train).any())
print(np.isinf(y_train).any())
print(np.isfinite(x_train).all())
print(np.isfinite(y_train).all())
model.fit(x_train, y_train)
并产生以下结果:
(4116, 37)
(4116,)
False
False
False
False
True
True
The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1]. Traceback (most recent call last): [...]
model.fit(x_train, y_train) File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 436, in inner_f
return f(**kwargs) File "D:\Programs\Anaconda\lib\site-packages\xgboost\sklearn.py", line 1173, in fit
label_transform=label_transform, File "D:\Programs\Anaconda\lib\site-packages\xgboost\sklearn.py", line 244, in _wrap_evaluation_matrices
missing=missing, File "D:\Programs\Anaconda\lib\site-packages\xgboost\sklearn.py", line 1172, in <lambda>
create_dmatrix=lambda **kwargs: DMatrix(nthread=self.n_jobs, **kwargs), File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 436, in inner_f
return f(**kwargs) File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 547, in
__init__
enable_categorical=enable_categorical, File "D:\Programs\Anaconda\lib\site-packages\xgboost\data.py", line 565, in dispatch_data_backend
feature_types) File "D:\Programs\Anaconda\lib\site-packages\xgboost\data.py", line 169, in
_from_numpy_array
ctypes.c_int(nthread))) File "D:\Programs\Anaconda\lib\site-packages\xgboost\core.py", line 210, in
_check_call
raise XGBoostError(py_str(_LIB.XGBGetLastError())) xgboost.core.XGBoostError: [14:21:29] C:/Users/Administrator/workspace/xgboost-win64_release_1.4.0/src/data/data.cc:945: Check failed: valid: Input data contains `inf` or `nan`
我的数据显然不包含任何“inf”或“nan”值。非常感谢任何关于如何从这里开始的想法。
我刚刚遇到了同样的错误,这似乎是由于存在非常大的浮点数 (1e300) 引起的。我使用对数变换修复了它。
使用 scikit-learn 的 StandardScaler 解决了我的问题。感谢 Antoine Messager 的回答,我最终做了以下事情:
from sklearn.preprocessing import StandardScaler
model = XGBClassifier()
scaler = StandardScaler()
x_trainScaled = scaler.fit_transform(x_train)
model.fit(x_trainScaled, y_train)