matplotlib:SMOTEd class 分布的直方图显示有色合成区域
matplotlib: histogram of SMOTEd class distribution showing colored synthetic region
假设我有一个像这样的二进制不平衡数据集:
from collections import Counter
from sklearn.datasets import make_classification
from matplotlib import pyplot as plt
from imblearn.over_sampling import SMOTE
# fake dataset
X, y = make_classification(n_samples=10000, n_features=2, n_redundant=0,
n_clusters_per_class=1, weights=[0.99], flip_y=0, random_state=1)
# summarize class distribution
counter = Counter(y)
print(counter)
Counter({0: 9900, 1: 100})
使用SMOTE
对少数群体进行过采样class:
oversample = SMOTE()
Xs, ys = oversample.fit_resample(X, y)
现在,显示 class 分布的直方图:
一个。过采样前:
plt.hist(y)
b。过采样后:
plt.hist(ys)
但我想在过采样图中显示,少数class部分以不同的颜色生成。
预期输出:
类似下图:
您可以使用 plt.bar
绘制条形图。通过在同一个子图中绘制两个条形图,第一个仍然是部分可见的。
import matplotlib.pyplot as plt
import numpy as np
# simulate before oversampling
y = np.random.choice([0, 1], 1000, p=[.95, .05])
# simulate after oversampling
ys = np.append(y, np.ones(sum(y == 0) - sum(y == 1), dtype=int))
plt.bar([0, 1], height=[sum(ys == 0), sum(ys == 1)], color=['cornflowerblue', 'lime'])
plt.bar([0, 1], height=[sum(y == 0), sum(y == 1)], color='cornflowerblue')
plt.xticks([0, 1])
plt.show()
假设我有一个像这样的二进制不平衡数据集:
from collections import Counter
from sklearn.datasets import make_classification
from matplotlib import pyplot as plt
from imblearn.over_sampling import SMOTE
# fake dataset
X, y = make_classification(n_samples=10000, n_features=2, n_redundant=0,
n_clusters_per_class=1, weights=[0.99], flip_y=0, random_state=1)
# summarize class distribution
counter = Counter(y)
print(counter)
Counter({0: 9900, 1: 100})
使用SMOTE
对少数群体进行过采样class:
oversample = SMOTE()
Xs, ys = oversample.fit_resample(X, y)
现在,显示 class 分布的直方图:
一个。过采样前:
plt.hist(y)
b。过采样后:
plt.hist(ys)
但我想在过采样图中显示,少数class部分以不同的颜色生成。
预期输出:
类似下图:
您可以使用 plt.bar
绘制条形图。通过在同一个子图中绘制两个条形图,第一个仍然是部分可见的。
import matplotlib.pyplot as plt
import numpy as np
# simulate before oversampling
y = np.random.choice([0, 1], 1000, p=[.95, .05])
# simulate after oversampling
ys = np.append(y, np.ones(sum(y == 0) - sum(y == 1), dtype=int))
plt.bar([0, 1], height=[sum(ys == 0), sum(ys == 1)], color=['cornflowerblue', 'lime'])
plt.bar([0, 1], height=[sum(y == 0), sum(y == 1)], color='cornflowerblue')
plt.xticks([0, 1])
plt.show()