Upweight 或为下采样示例增加权重
Upweight Or adding weight to the downsampled examples
您好,我已经对我的数据集进行了下采样,我需要有关提高权重或为下采样示例增加权重的帮助。见下方代码
#Separating majority and minority classes
df_majority = data[data.Collected_ind == 1]
df_minority = data[data.Collected_ind == 0]
# Downsample majority class
df_majority_downsampled = resample(df_majority,
replace=False, # sample without replacement
n_samples=152664, # to match minority class
random_state=1) # reproducible results
# Combining minority class with downsampled majority class
df_downsampled = pd.concat([df_majority_downsampled, df_minority])
# Display new class counts
df_downsampled.Collected_ind.value_counts()
df_downsampled['Collected_ind'].value_counts()
df_downsampled['Collected_ind'].value_counts(normalize=True)
#Randomly shuffle the rows.
df_downsampled = df_downsampled.sample(frac=1)
df_downsampled.to_csv("Sampled_Data.csv", index=False)
#Generate a train and test dataset
train = df_downsampled.sample(frac=0.8)
test = df_downsampled.drop(train.index)
train.to_csv("trainNew.csv", index=False)
test.to_csv("testNew.csv", index=False)
你的问题实际上帮助我回答了我自己的问题,因为我正在寻找这种语法。无论如何我都在这里,我会告诉你我在做什么。不知道你对重量的定义是否和我的一样,我们是这样定义的:
class_weight = (original_class_count/original_row_count) / (new_class_count/new_row_count)
因此,为了重新格式化您的代码,我会将 n_samples
替换为 len(df_minority)
,然后通过动态使用各种数据帧的长度将上面的公式添加为数据帧中的一列。
也许像
df_downsampled['weight']=np.where(df_downsampled['Collected_Ind']==1,(len(df_majority) / len(data) ) / ( len(df_minority) / len(df_minority) *2),(len(df_minority) / len(data) ) / ( len(df_minority) / len(df_minority) *2))
您好,我已经对我的数据集进行了下采样,我需要有关提高权重或为下采样示例增加权重的帮助。见下方代码
#Separating majority and minority classes
df_majority = data[data.Collected_ind == 1]
df_minority = data[data.Collected_ind == 0]
# Downsample majority class
df_majority_downsampled = resample(df_majority,
replace=False, # sample without replacement
n_samples=152664, # to match minority class
random_state=1) # reproducible results
# Combining minority class with downsampled majority class
df_downsampled = pd.concat([df_majority_downsampled, df_minority])
# Display new class counts
df_downsampled.Collected_ind.value_counts()
df_downsampled['Collected_ind'].value_counts()
df_downsampled['Collected_ind'].value_counts(normalize=True)
#Randomly shuffle the rows.
df_downsampled = df_downsampled.sample(frac=1)
df_downsampled.to_csv("Sampled_Data.csv", index=False)
#Generate a train and test dataset
train = df_downsampled.sample(frac=0.8)
test = df_downsampled.drop(train.index)
train.to_csv("trainNew.csv", index=False)
test.to_csv("testNew.csv", index=False)
你的问题实际上帮助我回答了我自己的问题,因为我正在寻找这种语法。无论如何我都在这里,我会告诉你我在做什么。不知道你对重量的定义是否和我的一样,我们是这样定义的:
class_weight = (original_class_count/original_row_count) / (new_class_count/new_row_count)
因此,为了重新格式化您的代码,我会将 n_samples
替换为 len(df_minority)
,然后通过动态使用各种数据帧的长度将上面的公式添加为数据帧中的一列。
也许像
df_downsampled['weight']=np.where(df_downsampled['Collected_Ind']==1,(len(df_majority) / len(data) ) / ( len(df_minority) / len(df_minority) *2),(len(df_minority) / len(data) ) / ( len(df_minority) / len(df_minority) *2))