Keras 模型的自定义指标,使用 Tensorflow 2.1
Custom metric for Keras model, using Tensorflow 2.1
我想使用 Keras 添加自定义指标来建模,我正在调试我的工作代码,但我没有找到执行我需要的操作的方法。
这个问题可以描述为通过逻辑多项回归的多分类问题。
我想实现的自定义指标是这样的:
(1/Number_of_Classes)*(TruePositivesClass1/TotalElementsClass1 + TruePositivesClass2/TotalElementsClass2 + ... + TruePositivesClassN/TotalElementsClassN)
其中 Number_of_Classes 必须从批次计算,即 np.unique(y_true).count()
和
每个求和项都类似于
len(np.where(y_true==class_i,1,0) == np.where(y_pred==class_i,1,0) )/np.where(y_true==class_i,1,0).sum()
根据混淆矩阵(2个变量的最小形式)
True False
True 15 3
False 12 1
公式为0.5*(15)/(15+12) + 0.5*(1/(1+3))=0.4027
代码可能类似于
def custom_metric(y_true,y_pred):
total_classes = Unique(y_true) #How calculate total unique elements?
summation = 0
for _ in unique_value_on_target:
# calculates Number of y_predict that are _
true_predics_of_class = Count(y_predict,_)
# calculates total number of items of class _ in batch y_true
true_values = Count(y_true,_)
value = true_predicts/true_values
summation + = value
return summation
我的预处理数据是像 x=[v1,v2,v3,v4,...,vn]
这样的 numpy 数组,而我的
objetive 列是一个 nompy 数组 y=[1, 0, 1, 0, 1, 0, 0, 1 ,..., 0, 1]
然后,将它们转换为张量:
x_train = tf.convert_to_tensor(x)
y_train = tf.convert_to_tensor(tf.keras.utils.to_categorical(y))
然后,将它们转换为tensorflow数据集对象:
train_ds = tf.data.Dataset.zip((tf.data.Dataset.from_tensor_slices(x_train),
tf.data.Dataset.from_tensor_slices(y_train)))
稍后,我拿一个迭代器:
train_itr = iter(
train_ds.shuffle(len(y_train) * 5, reshuffle_each_iteration=True).batch(len(y_train)))
最后,我采用迭代器的一个元素并训练
x_train, y_train = train_itr.get_next()
model.fit(x=x_train, y=y_train, batch_size=batch_size, epochs=epochs,
callbacks=[custom_callback], validation_data=test_itr.get_next())
因此,由于对象是数据集迭代器,我无法找到我想要的函数来操作它们,以获得描述的自定义指标。
所以你想计算 批量中多类的平均召回率,这是我使用 numpy
和 tensorflow
:
的示例代码
import tensorflow as tf
import numpy as np
y_t = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1], [0, 0, 0, 1]], dtype=np.float32)
y_p = np.array([[1, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1], [0, 0, 0, 1]], dtype=np.float32)
def average_recall(y_true, y_pred):
# Get indexes of both labels and predictions
labels = np.argmax(y_true, axis=1)
predictions = np.argmax(y_pred, axis=1)
# Get confusion matrix from labels and predictions
confusion_matrix = tf.math.confusion_matrix(labels, predictions).numpy()
# Get number of all true positives in each class
all_true_positives = np.diag(confusion_matrix)
# Get number of all elements in each class
all_class_sum = np.sum(confusion_matrix, axis=1)
# Get rid of classes that don't show in batch
zero_index = np.where(all_class_sum == 0)[0]
all_true_positives = np.delete(all_true_positives, zero_index)
all_class_sum = np.delete(all_class_sum, zero_index)
print("confusion_matrix:\n {},\n all_true_positives:\n {},\n all_class_sum:\n {}".format(
confusion_matrix, all_true_positives, all_class_sum))
# Average TruePositives / TotalElements wrt all classes that show in batch
return np.mean(all_true_positives / all_class_sum)
avg_recall = average_recall(y_t, y_p)
print(avg_recall)
输出:
confusion_matrix:
[[1 0 0 0]
[1 1 0 0]
[0 0 0 0]
[0 0 0 2]],
all_true_positives:
[1 1 2],
all_class_sum:
[1 2 2]
0.8333333333333334
仅使用tensorflow实现:
import tensorflow as tf
y_t = tf.constant([[1, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1], [0, 0, 0, 1]], dtype=tf.float32)
y_p = tf.constant([[1, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1], [0, 0, 0, 1]], dtype=tf.float32)
def average_recall(y_true, y_pred):
# Get indexes of both labels and predictions
labels = tf.argmax(y_true, axis=1)
predictions = tf.argmax(y_pred, axis=1)
# Get confusion matrix from labels and predictions
confusion_matrix = tf.math.confusion_matrix(labels, predictions)
# Get number of all true positives in each class
all_true_positives = tf.linalg.diag_part(confusion_matrix)
# Get number of all elements in each class
all_class_sum = tf.reduce_sum(confusion_matrix, axis=1)
# Get rid of classes that don't show in batch
mask = tf.not_equal(all_class_sum, tf.constant(0))
all_true_positives = tf.boolean_mask(all_true_positives, mask)
all_class_sum = tf.boolean_mask(all_class_sum, mask)
print("confusion_matrix:\n {},\n all_true_positives:\n {},\n all_class_sum:\n {}".format(
confusion_matrix, all_true_positives, all_class_sum))
# Average TruePositives / TotalElements wrt all classes that show in batch
return tf.reduce_mean(all_true_positives / all_class_sum)
avg_recall = average_recall(y_t, y_p)
print(avg_recall)
输出:
confusion_matrix:
[[1 0 0 0]
[1 1 0 0]
[0 0 0 0]
[0 0 0 2]],
all_true_positives:
[1 1 2],
all_class_sum:
[1 2 2]
tf.Tensor(0.8333333333333334, shape=(), dtype=float64)
参考:
Calculate precision and recall for multiclass classification using confusion matrix
我想使用 Keras 添加自定义指标来建模,我正在调试我的工作代码,但我没有找到执行我需要的操作的方法。
这个问题可以描述为通过逻辑多项回归的多分类问题。 我想实现的自定义指标是这样的:
(1/Number_of_Classes)*(TruePositivesClass1/TotalElementsClass1 + TruePositivesClass2/TotalElementsClass2 + ... + TruePositivesClassN/TotalElementsClassN)
其中 Number_of_Classes 必须从批次计算,即 np.unique(y_true).count()
和
每个求和项都类似于
len(np.where(y_true==class_i,1,0) == np.where(y_pred==class_i,1,0) )/np.where(y_true==class_i,1,0).sum()
根据混淆矩阵(2个变量的最小形式)
True False
True 15 3
False 12 1
公式为0.5*(15)/(15+12) + 0.5*(1/(1+3))=0.4027
代码可能类似于
def custom_metric(y_true,y_pred):
total_classes = Unique(y_true) #How calculate total unique elements?
summation = 0
for _ in unique_value_on_target:
# calculates Number of y_predict that are _
true_predics_of_class = Count(y_predict,_)
# calculates total number of items of class _ in batch y_true
true_values = Count(y_true,_)
value = true_predicts/true_values
summation + = value
return summation
我的预处理数据是像 x=[v1,v2,v3,v4,...,vn]
这样的 numpy 数组,而我的
objetive 列是一个 nompy 数组 y=[1, 0, 1, 0, 1, 0, 0, 1 ,..., 0, 1]
然后,将它们转换为张量:
x_train = tf.convert_to_tensor(x)
y_train = tf.convert_to_tensor(tf.keras.utils.to_categorical(y))
然后,将它们转换为tensorflow数据集对象:
train_ds = tf.data.Dataset.zip((tf.data.Dataset.from_tensor_slices(x_train),
tf.data.Dataset.from_tensor_slices(y_train)))
稍后,我拿一个迭代器:
train_itr = iter(
train_ds.shuffle(len(y_train) * 5, reshuffle_each_iteration=True).batch(len(y_train)))
最后,我采用迭代器的一个元素并训练
x_train, y_train = train_itr.get_next()
model.fit(x=x_train, y=y_train, batch_size=batch_size, epochs=epochs,
callbacks=[custom_callback], validation_data=test_itr.get_next())
因此,由于对象是数据集迭代器,我无法找到我想要的函数来操作它们,以获得描述的自定义指标。
所以你想计算 批量中多类的平均召回率,这是我使用 numpy
和 tensorflow
:
import tensorflow as tf
import numpy as np
y_t = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1], [0, 0, 0, 1]], dtype=np.float32)
y_p = np.array([[1, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1], [0, 0, 0, 1]], dtype=np.float32)
def average_recall(y_true, y_pred):
# Get indexes of both labels and predictions
labels = np.argmax(y_true, axis=1)
predictions = np.argmax(y_pred, axis=1)
# Get confusion matrix from labels and predictions
confusion_matrix = tf.math.confusion_matrix(labels, predictions).numpy()
# Get number of all true positives in each class
all_true_positives = np.diag(confusion_matrix)
# Get number of all elements in each class
all_class_sum = np.sum(confusion_matrix, axis=1)
# Get rid of classes that don't show in batch
zero_index = np.where(all_class_sum == 0)[0]
all_true_positives = np.delete(all_true_positives, zero_index)
all_class_sum = np.delete(all_class_sum, zero_index)
print("confusion_matrix:\n {},\n all_true_positives:\n {},\n all_class_sum:\n {}".format(
confusion_matrix, all_true_positives, all_class_sum))
# Average TruePositives / TotalElements wrt all classes that show in batch
return np.mean(all_true_positives / all_class_sum)
avg_recall = average_recall(y_t, y_p)
print(avg_recall)
输出:
confusion_matrix:
[[1 0 0 0]
[1 1 0 0]
[0 0 0 0]
[0 0 0 2]],
all_true_positives:
[1 1 2],
all_class_sum:
[1 2 2]
0.8333333333333334
仅使用tensorflow实现:
import tensorflow as tf
y_t = tf.constant([[1, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1], [0, 0, 0, 1]], dtype=tf.float32)
y_p = tf.constant([[1, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1], [0, 0, 0, 1]], dtype=tf.float32)
def average_recall(y_true, y_pred):
# Get indexes of both labels and predictions
labels = tf.argmax(y_true, axis=1)
predictions = tf.argmax(y_pred, axis=1)
# Get confusion matrix from labels and predictions
confusion_matrix = tf.math.confusion_matrix(labels, predictions)
# Get number of all true positives in each class
all_true_positives = tf.linalg.diag_part(confusion_matrix)
# Get number of all elements in each class
all_class_sum = tf.reduce_sum(confusion_matrix, axis=1)
# Get rid of classes that don't show in batch
mask = tf.not_equal(all_class_sum, tf.constant(0))
all_true_positives = tf.boolean_mask(all_true_positives, mask)
all_class_sum = tf.boolean_mask(all_class_sum, mask)
print("confusion_matrix:\n {},\n all_true_positives:\n {},\n all_class_sum:\n {}".format(
confusion_matrix, all_true_positives, all_class_sum))
# Average TruePositives / TotalElements wrt all classes that show in batch
return tf.reduce_mean(all_true_positives / all_class_sum)
avg_recall = average_recall(y_t, y_p)
print(avg_recall)
输出:
confusion_matrix:
[[1 0 0 0]
[1 1 0 0]
[0 0 0 0]
[0 0 0 2]],
all_true_positives:
[1 1 2],
all_class_sum:
[1 2 2]
tf.Tensor(0.8333333333333334, shape=(), dtype=float64)
参考:
Calculate precision and recall for multiclass classification using confusion matrix