Defore 过采样数据显示 0
Defore oversampling data showing 0
我正在处理我的数据集并且对此很陌生。下面是代码:
class_col_name='Creditability'
feature_names=df.columns[df.columns != class_col_name ]
# 70% training and 30% test
X_train, X_test, y_train, y_test = train_test_split(df.loc[:, feature_names], df[class_col_name], test_size=0.3,random_state=1)
print("Number transactions X_train dataset: ", X_train.shape)
print("Number transactions y_train dataset: ", y_train.shape)
print("Number transactions X_test dataset: ", X_test.shape)
print("Number transactions y_test dataset: ", y_test.shape)
print("Before OverSampling, counts of label '1': {}".format(sum(y_train == 1)))
print("Before OverSampling, counts of label '0': {} \n".format(sum(y_train == 0)))
我正在尝试对我的数据集应用过采样,但是当我在过采样之前对其进行计数时,它在输出中显示为 0,但它确实显示数据集有数据:
下面是输出:
Number transactions X_train dataset: (700, 20)
Number transactions y_train dataset: (700,)
Number transactions X_test dataset: (300, 20)
Number transactions y_test dataset: (300,)
Before OverSampling, counts of label '1': 0
Before OverSampling, counts of label '0': 0
我正在尝试理解输出并对其进行处理。
您可能想确认可能的 class 标签实际上是 0 和 1。您可以尝试
print(y_train.unique())
检查 class 标签是什么。
如果 y_train 是一个 pandas 系列,标签在 [0, 1],那么我相信最后两行的结果实际上应该等于 [=17= 的大小].如果标签不是整数 0 或 1 那么这就可以解释为什么总和都是 0.
我正在处理我的数据集并且对此很陌生。下面是代码:
class_col_name='Creditability'
feature_names=df.columns[df.columns != class_col_name ]
# 70% training and 30% test
X_train, X_test, y_train, y_test = train_test_split(df.loc[:, feature_names], df[class_col_name], test_size=0.3,random_state=1)
print("Number transactions X_train dataset: ", X_train.shape)
print("Number transactions y_train dataset: ", y_train.shape)
print("Number transactions X_test dataset: ", X_test.shape)
print("Number transactions y_test dataset: ", y_test.shape)
print("Before OverSampling, counts of label '1': {}".format(sum(y_train == 1)))
print("Before OverSampling, counts of label '0': {} \n".format(sum(y_train == 0)))
我正在尝试对我的数据集应用过采样,但是当我在过采样之前对其进行计数时,它在输出中显示为 0,但它确实显示数据集有数据:
下面是输出:
Number transactions X_train dataset: (700, 20)
Number transactions y_train dataset: (700,)
Number transactions X_test dataset: (300, 20)
Number transactions y_test dataset: (300,)
Before OverSampling, counts of label '1': 0
Before OverSampling, counts of label '0': 0
我正在尝试理解输出并对其进行处理。
您可能想确认可能的 class 标签实际上是 0 和 1。您可以尝试
print(y_train.unique())
检查 class 标签是什么。
如果 y_train 是一个 pandas 系列,标签在 [0, 1],那么我相信最后两行的结果实际上应该等于 [=17= 的大小].如果标签不是整数 0 或 1 那么这就可以解释为什么总和都是 0.