如何在二进制 class 化的线性 class 化器中应用 class 权重?
How to apply class weights in linear classifier for binary classification?
这是我用来执行二进制 classification 的线性 classifier,这里是代码片段:
my_optimizer = tf.train.AdagradOptimizer(learning_rate = learning_rate)
my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer,5.0)
# Create a linear classifier object
linear_classifier = tf.estimator.LinearClassifier(
feature_columns = feature_columns,
optimizer = my_optimizer
)
linear_classifier.train(input_fn = training_input_fn, steps = steps)
数据集不平衡,只有两个 classes yes/no。 NO class 示例的数量为 36548,而 YES class 示例的数量为 4640。
如何对这些数据应用平衡?我一直在搜索,我可以找到与 class 权重等相关的内容,但我找不到如何创建 class 权重以及如何应用于张量流的训练方法。
这是我计算损失的方法:
training_probabilities = linear_classifier.predict(input_fn = training_predict_input_fn)
training_probabilities = np.array([item['probabilities'] for item in training_probabilities])
validation_probabilities = linear_classifier.predict(input_fn=validation_predict_input_fn)
validation_probabilities = np.array([item['probabilities'] for item in validation_probabilities])
training_log_loss = metrics.log_loss(training_targets, training_probabilities)
validation_log_loss = metrics.log_loss(validation_targets, validation_probabilities)
我假设您正在使用 sklearn for computing your loss. If that is the case you can add class weights by using the argument sample_weight
and pass on an array containing the weight to be given for each data point. sample_weight
is an rolled out version of class_weights
. You can compute sample_weight
array by passing on the sample weights as given here 中的 log_loss
函数。
将以下行添加到您的代码中:
sample_wts = compute_sample_weight("balanced", training_targets)
training_log_loss = metrics.log_loss(training_targets, training_probabilities, sample_weight= sample_wts)
希望对您有所帮助!
这是我用来执行二进制 classification 的线性 classifier,这里是代码片段:
my_optimizer = tf.train.AdagradOptimizer(learning_rate = learning_rate)
my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer,5.0)
# Create a linear classifier object
linear_classifier = tf.estimator.LinearClassifier(
feature_columns = feature_columns,
optimizer = my_optimizer
)
linear_classifier.train(input_fn = training_input_fn, steps = steps)
数据集不平衡,只有两个 classes yes/no。 NO class 示例的数量为 36548,而 YES class 示例的数量为 4640。
如何对这些数据应用平衡?我一直在搜索,我可以找到与 class 权重等相关的内容,但我找不到如何创建 class 权重以及如何应用于张量流的训练方法。
这是我计算损失的方法:
training_probabilities = linear_classifier.predict(input_fn = training_predict_input_fn)
training_probabilities = np.array([item['probabilities'] for item in training_probabilities])
validation_probabilities = linear_classifier.predict(input_fn=validation_predict_input_fn)
validation_probabilities = np.array([item['probabilities'] for item in validation_probabilities])
training_log_loss = metrics.log_loss(training_targets, training_probabilities)
validation_log_loss = metrics.log_loss(validation_targets, validation_probabilities)
我假设您正在使用 sklearn for computing your loss. If that is the case you can add class weights by using the argument sample_weight
and pass on an array containing the weight to be given for each data point. sample_weight
is an rolled out version of class_weights
. You can compute sample_weight
array by passing on the sample weights as given here 中的 log_loss
函数。
将以下行添加到您的代码中:
sample_wts = compute_sample_weight("balanced", training_targets)
training_log_loss = metrics.log_loss(training_targets, training_probabilities, sample_weight= sample_wts)
希望对您有所帮助!