用于平衡数据的 SMOTE
SMOTE for balancing data
我正在尝试训练 GradientBoosting 分类器。由于我的数据不平衡,我正在考虑使用 SMOTE 来平衡它。
我尝试如下:
from sklearn.ensemble import GradientBoostingRegressor
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_boston
from sklearn.metrics import mean_absolute_error
# Import train_test_split function
from sklearn.model_selection import train_test_split
# Split dataset into training set and test set
from imblearn.over_sampling import SMOTE
y=df['Label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, stratify=y)
sm = SMOTE(random_state = 42)
X_train_oversampled, y_train_oversampled = sm.fit_sample(X_train, y_train)
X_train = pd.DataFrame(X_train_oversampled, columns=X_train.columns)
但是我遇到了这个错误:
---> 20 X_train = pd.DataFrame(X_train_oversampled, columns=X_train.columns)
/anaconda3/lib/python3.7/site-packages/scipy/sparse/base.py in __getattr__(self, attr)
689 return self.getnnz()
690 else:
--> 691 raise AttributeError(attr + " not found")
692
693 def transpose(self, axes=None, copy=False):
AttributeError: columns not found
我不知道应该替换什么以及如何将 SMOTE 与 X_train 和 y_train 一起使用。能否请教一下如何按正确的顺序使用它?
您没有提供足够的代码或数据,也没有提供完整的回溯,可以肯定...但是最后一行中出现的错误表明 SMOTE 工作正常,错误是因为 X_train
是一个稀疏数组,它没有列名,因此没有属性 columns
。看起来你在某个时候有列名,所以你应该能够从 df
.
中检索它们
我正在尝试训练 GradientBoosting 分类器。由于我的数据不平衡,我正在考虑使用 SMOTE 来平衡它。 我尝试如下:
from sklearn.ensemble import GradientBoostingRegressor
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_boston
from sklearn.metrics import mean_absolute_error
# Import train_test_split function
from sklearn.model_selection import train_test_split
# Split dataset into training set and test set
from imblearn.over_sampling import SMOTE
y=df['Label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, stratify=y)
sm = SMOTE(random_state = 42)
X_train_oversampled, y_train_oversampled = sm.fit_sample(X_train, y_train)
X_train = pd.DataFrame(X_train_oversampled, columns=X_train.columns)
但是我遇到了这个错误:
---> 20 X_train = pd.DataFrame(X_train_oversampled, columns=X_train.columns)
/anaconda3/lib/python3.7/site-packages/scipy/sparse/base.py in __getattr__(self, attr)
689 return self.getnnz()
690 else:
--> 691 raise AttributeError(attr + " not found")
692
693 def transpose(self, axes=None, copy=False):
AttributeError: columns not found
我不知道应该替换什么以及如何将 SMOTE 与 X_train 和 y_train 一起使用。能否请教一下如何按正确的顺序使用它?
您没有提供足够的代码或数据,也没有提供完整的回溯,可以肯定...但是最后一行中出现的错误表明 SMOTE 工作正常,错误是因为 X_train
是一个稀疏数组,它没有列名,因此没有属性 columns
。看起来你在某个时候有列名,所以你应该能够从 df
.