预测结果与标签之间的关联
correlating between prediction result to label
我有一个预测以下结果的 keras 模型:
(这是一个多 class 问题,有 6 个可能 classes)
[[0.44599777 0.00667355 0.10674711 0.02558559 0.29180232 0.12319366]]
所以鉴于上述结果,模型预测第一个 class - 但我知道这是错误的。
我能够达到 ~92% 的准确率:
Epoch 1/10
1128/1128 [==============================] - 18s 15ms/step - loss: 1.3685 - accuracy: 0.4596 - val_loss: 0.6238 - val_accuracy: 0.7785
Epoch 2/10
1128/1128 [==============================] - 17s 15ms/step - loss: 0.7200 - accuracy: 0.7373 - val_loss: 0.4055 - val_accuracy: 0.8467
Epoch 3/10
1128/1128 [==============================] - 17s 15ms/step - loss: 0.4994 - accuracy: 0.8200 - val_loss: 0.3284 - val_accuracy: 0.8772
Epoch 4/10
1128/1128 [==============================] - 17s 15ms/step - loss: 0.3966 - accuracy: 0.8568 - val_loss: 0.3100 - val_accuracy: 0.9043
Epoch 5/10
1128/1128 [==============================] - 18s 16ms/step - loss: 0.3428 - accuracy: 0.8810 - val_loss: 0.3044 - val_accuracy: 0.9102
Epoch 6/10
1128/1128 [==============================] - 39s 34ms/step - loss: 0.3075 - accuracy: 0.8915 - val_loss: 0.2970 - val_accuracy: 0.9184
Epoch 7/10
1128/1128 [==============================] - 25s 22ms/step - loss: 0.2779 - accuracy: 0.9035 - val_loss: 0.3051 - val_accuracy: 0.9226
Epoch 8/10
1128/1128 [==============================] - 19s 17ms/step - loss: 0.2663 - accuracy: 0.9069 - val_loss: 0.3207 - val_accuracy: 0.9261
Epoch 9/10
1128/1128 [==============================] - 19s 17ms/step - loss: 0.2514 - accuracy: 0.9138 - val_loss: 0.2855 - val_accuracy: 0.9311
Epoch 10/10
1128/1128 [==============================] - 20s 18ms/step - loss: 0.2331 - accuracy: 0.9196 - val_loss: 0.3352 - val_accuracy: 0.9263
Test loss: 0.33516398072242737
Test accuracy: 0.9262799024581909
以下是我的预测方式:
bug_name = '51859'
issue = conn.issue(bug_name, expand='changelog')
candidate_bug = Bug(issue, connections_dict)
candidate_bug.extract_all_info()
data = candidate_bug.get_data_as_df()
data = data.drop('group_name', axis='columns')
free_text_tokenized, _ = prepare_free_text_inputs(data, data)
model_inputs = [free_text_tokenized]
res = model.predict(model_inputs)
print(f'expected: {get_group_by_bug_owner(candidate_bug.get_owner())}')
# Generate arg maxes for predictions
print(res)
classes = np.argmax(res, axis=1)
print(classes)
print(np.unique(y_train))
class_index = classes[0]
print(np.unique(y_train)[class_index])
这是输出:
expected: D
[[0.44599777 0.00667355 0.10674711 0.02558559 0.29180232 0.12319366]]
[0]
['A' 'B' 'C' 'D' 'E' 'F']
A
...所以恐怕我的问题是我不知道将这些结果“分配”给标签。
我已经尝试了多次(我知道预测应该是什么),但它总是错过预期的结果。
此外 - 我使用 LabelEncoder
如下:
# prepare target
def prepare_targets(y_train, y_test):
le = LabelEncoder()
le.fit(y_train)
y_train_enc = le.transform(y_train)
y_test_enc = le.transform(y_test)
return y_train_enc, y_test_enc
y_train_enc, y_test_enc = prepare_targets(y_train, y_test)
我错过了什么?我是不是用错了列表(y_train
)?
回答我自己的问题(对于将由它介绍的人)。
2 个我发现的问题:
我(非常错误地)在预测数据 (fit_on_text
) 上触发了转换器,这是一个很大的禁忌! - 必须使用已通过训练数据安装的相同变压器。
标签在 LabelEncoder
中编码,最初是在训练模型之前使用的,所以我创建了一个字典来映射每个标签,如下所示:
# prepare target
print('preparing lables')
le = LabelEncoder()
le_name_mapping = {}
le.fit(y_train)
le_name_mapping.update(dict(zip(le.transform(le.classes_), le.classes_)))
print(le_name_mapping)
y_train_enc = le.transform(y_train)
y_test_enc = le.transform(y_test)
稍后我将其用于预测结果:
res = model.predict(model_inputs)
selected_class_index = np.argmax(res, axis=1)[0]
print(selected_class_index)
print(f'actual: {le_name_mapping[selected_class_index]}')
我有一个预测以下结果的 keras 模型: (这是一个多 class 问题,有 6 个可能 classes)
[[0.44599777 0.00667355 0.10674711 0.02558559 0.29180232 0.12319366]]
所以鉴于上述结果,模型预测第一个 class - 但我知道这是错误的。
我能够达到 ~92% 的准确率:
Epoch 1/10
1128/1128 [==============================] - 18s 15ms/step - loss: 1.3685 - accuracy: 0.4596 - val_loss: 0.6238 - val_accuracy: 0.7785
Epoch 2/10
1128/1128 [==============================] - 17s 15ms/step - loss: 0.7200 - accuracy: 0.7373 - val_loss: 0.4055 - val_accuracy: 0.8467
Epoch 3/10
1128/1128 [==============================] - 17s 15ms/step - loss: 0.4994 - accuracy: 0.8200 - val_loss: 0.3284 - val_accuracy: 0.8772
Epoch 4/10
1128/1128 [==============================] - 17s 15ms/step - loss: 0.3966 - accuracy: 0.8568 - val_loss: 0.3100 - val_accuracy: 0.9043
Epoch 5/10
1128/1128 [==============================] - 18s 16ms/step - loss: 0.3428 - accuracy: 0.8810 - val_loss: 0.3044 - val_accuracy: 0.9102
Epoch 6/10
1128/1128 [==============================] - 39s 34ms/step - loss: 0.3075 - accuracy: 0.8915 - val_loss: 0.2970 - val_accuracy: 0.9184
Epoch 7/10
1128/1128 [==============================] - 25s 22ms/step - loss: 0.2779 - accuracy: 0.9035 - val_loss: 0.3051 - val_accuracy: 0.9226
Epoch 8/10
1128/1128 [==============================] - 19s 17ms/step - loss: 0.2663 - accuracy: 0.9069 - val_loss: 0.3207 - val_accuracy: 0.9261
Epoch 9/10
1128/1128 [==============================] - 19s 17ms/step - loss: 0.2514 - accuracy: 0.9138 - val_loss: 0.2855 - val_accuracy: 0.9311
Epoch 10/10
1128/1128 [==============================] - 20s 18ms/step - loss: 0.2331 - accuracy: 0.9196 - val_loss: 0.3352 - val_accuracy: 0.9263
Test loss: 0.33516398072242737
Test accuracy: 0.9262799024581909
以下是我的预测方式:
bug_name = '51859'
issue = conn.issue(bug_name, expand='changelog')
candidate_bug = Bug(issue, connections_dict)
candidate_bug.extract_all_info()
data = candidate_bug.get_data_as_df()
data = data.drop('group_name', axis='columns')
free_text_tokenized, _ = prepare_free_text_inputs(data, data)
model_inputs = [free_text_tokenized]
res = model.predict(model_inputs)
print(f'expected: {get_group_by_bug_owner(candidate_bug.get_owner())}')
# Generate arg maxes for predictions
print(res)
classes = np.argmax(res, axis=1)
print(classes)
print(np.unique(y_train))
class_index = classes[0]
print(np.unique(y_train)[class_index])
这是输出:
expected: D
[[0.44599777 0.00667355 0.10674711 0.02558559 0.29180232 0.12319366]]
[0]
['A' 'B' 'C' 'D' 'E' 'F']
A
...所以恐怕我的问题是我不知道将这些结果“分配”给标签。 我已经尝试了多次(我知道预测应该是什么),但它总是错过预期的结果。
此外 - 我使用 LabelEncoder
如下:
# prepare target
def prepare_targets(y_train, y_test):
le = LabelEncoder()
le.fit(y_train)
y_train_enc = le.transform(y_train)
y_test_enc = le.transform(y_test)
return y_train_enc, y_test_enc
y_train_enc, y_test_enc = prepare_targets(y_train, y_test)
我错过了什么?我是不是用错了列表(y_train
)?
回答我自己的问题(对于将由它介绍的人)。
2 个我发现的问题:
我(非常错误地)在预测数据 (
fit_on_text
) 上触发了转换器,这是一个很大的禁忌! - 必须使用已通过训练数据安装的相同变压器。标签在
LabelEncoder
中编码,最初是在训练模型之前使用的,所以我创建了一个字典来映射每个标签,如下所示:
# prepare target
print('preparing lables')
le = LabelEncoder()
le_name_mapping = {}
le.fit(y_train)
le_name_mapping.update(dict(zip(le.transform(le.classes_), le.classes_)))
print(le_name_mapping)
y_train_enc = le.transform(y_train)
y_test_enc = le.transform(y_test)
稍后我将其用于预测结果:
res = model.predict(model_inputs)
selected_class_index = np.argmax(res, axis=1)[0]
print(selected_class_index)
print(f'actual: {le_name_mapping[selected_class_index]}')