Python 过采样将多个采样器组合在一个管道中
Python oversampling combine several samplers in a pipeline
我的问题涉及 SMOTE 引发的值错误 class。
Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6
# imbalanced learn is a package containing impelementation of SMOTE
from imblearn.over_sampling import SMOTE, ADASYN, RandomOverSampler
from imblearn.pipeline import Pipeline
# label column (everythin except the first column)
y = feature_set.iloc[:,0]
# feature matrix: everything except text and label columns
x = feature_set.loc[:, feature_set.columns != 'text_column']
x = x.loc[:, x.columns != 'label_column']
x_resampled, y_resampled = SMOTE().fit_resample(x, y)
经过一番调查后,我发现我的一些 classes(总共 158 个)的采样率极低。
根据本文提出的解决方案
Create a pipeline that is using SMOTE and RandomOversampler in a way
that satisfies the condition n_neighbors <= n_samples for smoted
classes and uses random oversampling when the condition is not
satisfied.
但是,我仍在努力设置我的实验并 运行ning。
# initilize oversamplers
smote = SMOTE()
randomSampler = RandomOverSampler()
# create a pipeline
pipeline = Pipeline([('smote', smote), ('randomSampler', randomSampler)])
pipeline.fit_resample(x, y)
当我 运行 它时,我仍然有同样的错误。我的猜测是,生成的管道应用了两个采样器,而我只需要一次应用其中一个,基于预定义的条件(如果项目数小于 X,则 RandomSampler,否则 SMOTE)。
有没有办法设置条件以在项目数量极少的情况下调用 RandomSampler?
提前谢谢你。
我也遇到了和你一样的问题(Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6
),和你一样阅读并遵循了那个人的建议。
我认为您遇到了同样的错误,因为您在 SMOTE 操作之后放置了随机过采样器。也就是说,在应用 SMOTE 算法之前,您需要对少数群体进行过采样 类。
这对我有用:
pipe = Pipeline([
('tfidf', TfidfVectorizer()),
('ros', RandomOverSampler()),
('oversampler', SMOTE()),
('clf', LinearSVC()),
])
我的问题涉及 SMOTE 引发的值错误 class。
Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6
# imbalanced learn is a package containing impelementation of SMOTE
from imblearn.over_sampling import SMOTE, ADASYN, RandomOverSampler
from imblearn.pipeline import Pipeline
# label column (everythin except the first column)
y = feature_set.iloc[:,0]
# feature matrix: everything except text and label columns
x = feature_set.loc[:, feature_set.columns != 'text_column']
x = x.loc[:, x.columns != 'label_column']
x_resampled, y_resampled = SMOTE().fit_resample(x, y)
经过一番调查后,我发现我的一些 classes(总共 158 个)的采样率极低。
根据本文提出的解决方案
Create a pipeline that is using SMOTE and RandomOversampler in a way that satisfies the condition n_neighbors <= n_samples for smoted classes and uses random oversampling when the condition is not satisfied.
但是,我仍在努力设置我的实验并 运行ning。
# initilize oversamplers
smote = SMOTE()
randomSampler = RandomOverSampler()
# create a pipeline
pipeline = Pipeline([('smote', smote), ('randomSampler', randomSampler)])
pipeline.fit_resample(x, y)
当我 运行 它时,我仍然有同样的错误。我的猜测是,生成的管道应用了两个采样器,而我只需要一次应用其中一个,基于预定义的条件(如果项目数小于 X,则 RandomSampler,否则 SMOTE)。 有没有办法设置条件以在项目数量极少的情况下调用 RandomSampler?
提前谢谢你。
我也遇到了和你一样的问题(Expected n_neighbors <= n_samples, but n_samples = 1, n_neighbors = 6
),和你一样阅读并遵循了那个人的建议。
我认为您遇到了同样的错误,因为您在 SMOTE 操作之后放置了随机过采样器。也就是说,在应用 SMOTE 算法之前,您需要对少数群体进行过采样 类。
这对我有用:
pipe = Pipeline([
('tfidf', TfidfVectorizer()),
('ros', RandomOverSampler()),
('oversampler', SMOTE()),
('clf', LinearSVC()),
])