如何使用 python 将混淆矩阵记录到 azureml 平台

Question

你好 Whosebugers，

我正在使用 azureml，我想知道是否可以记录我正在训练的 xgboost 模型的混淆矩阵，以及我已经记录的其他指标。这是我正在使用的代码示例：

from azureml.core.model import Model
from azureml.core import Workspace
from azureml.core.experiment import Experiment
from azureml.core.authentication import ServicePrincipalAuthentication
import json

with open('./azureml.config', 'r') as f:
    config = json.load(f)

svc_pr = ServicePrincipalAuthentication(
   tenant_id=config['tenant_id'],
   service_principal_id=config['svc_pr_id'],
   service_principal_password=config['svc_pr_password'])


ws = Workspace(workspace_name=config['workspace_name'],
                        subscription_id=config['subscription_id'],
                        resource_group=config['resource_group'],
                        auth=svc_pr)

y_pred = model.predict(dtest)

acc = metrics.accuracy_score(y_test, (y_pred>.5).astype(int))
run.log("accuracy",  acc)
f1 = metrics.f1_score(y_test, (y_pred>.5).astype(int), average='binary')
run.log("f1 score",  f1)


cmtx = metrics.confusion_matrix(y_test,(y_pred>.5).astype(int))
run.log_confusion_matrix('Confusion matrix', cmtx)

上面的代码引发了这种错误：

TypeError: Object of type ndarray is not JSON serializable

我已经尝试将矩阵转换为更简单的矩阵，但在我记录它的 "manual" 版本之前发生了另一个错误 (cmtx = [[30000, 50],[40, 2000]])。

run.log_confusion_matrix('Confusion matrix', [list([int(y) for y in x]) for x in cmtx])

AzureMLException: AzureMLException:
    Message: UserError: Resource Conflict: ArtifactId ExperimentRun/dcid.3196bf92-4952-4850-9a8a-    c5103b205379/Confusion matrix already exists.
    InnerException None
    ErrorResponse 
{
    "error": {
        "message": "UserError: Resource Conflict: ArtifactId ExperimentRun/dcid.3196bf92-4952-4850-9a8a-c5103b205379/Confusion matrix already exists."
    }
}

这让我觉得我没有正确处理命令 run.log_confusion_matrix()。那么，再次强调一下，我可以将混淆矩阵记录到我的 azureml 实验中的最佳方式是什么？

Answer 1

感谢我的同事，我最终找到了解决方案。因此，我正在回答自己，以结束问题，也许可以帮助其他人。

您可以在此 link 中找到合适的函数：https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run.run?view=azure-ml-py#log-confusion-matrix-name--value--description----。

无论如何，您还必须考虑到，Azure 显然不支持 sklearn 返回的标准混淆矩阵格式。它确实只接受列表列表，而不是用 numpy.int64 元素填充的 numpy 数组。所以你还必须以更简单的格式转换矩阵（为了简单起见，我在下面的命令中使用了嵌套列表理解：

cmtx = metrics.confusion_matrix(y_test,(y_pred>.5).astype(int))
cmtx = {

"schema_type": "confusion_matrix",
"parameters": params,
 "data": {"class_labels": ["0", "1"],
          "matrix": [[int(y) for y in x] for x in cmtx]}
}
run.log_confusion_matrix('Confusion matrix - error rate', cmtx)

如何使用 python 将混淆矩阵记录到 azureml 平台

How to log a confusion matrix to azureml platform using python

python

azure

confusion-matrix

xgboost

azureml