使用 Python 从经过训练的 Vertex AI 表格回归模型访问特征重要性

Question

我正在处理使用 GCP 中 Vertex 的 Tabular automl 训练的模型。培训和批量预测工作正常。我正在尝试在可视化中使用特征重要性并尝试从 python 中获取它们。我可以使用@Ricco D 为我发布的代码进行模型评估：


api_endpoint = 'us-central1-aiplatform.googleapis.com'
client_options = {"api_endpoint": api_endpoint} # api_endpoint is required for client_options
client_model = aiplatform.services.model_service.ModelServiceClient(client_options=client_options)
project_id = 't...1'
location = 'us-central1'
model_id = '6...2'

model_name = f'projects/{project_id}/locations/{location}/models/{model_id}'
list_eval_request = aiplatform.types.ListModelEvaluationsRequest(parent=model_name)
list_eval = client_model.list_model_evaluations(request=list_eval_request)
for val in list_eval:
    print(val.model_explanation)

但我不知道如何获得训练管道中生成的训练模型的特征重要性。我可以在模型页面上看到它们，但无法从 python 访问它们：

返回的代码 ListModelEvaluationsPager 对象是这样的：

  name: "projects/7...3/locations/us-central1/models/6...2/evaluations/5...0"
  metrics_schema_uri: "gs://google-cloud-aiplatform/schema/modelevaluation/regression_metrics_1.0.0.yaml"
  metrics {
    struct_value {
      fields {
        key: "meanAbsoluteError"
        value {
          number_value: 27.391115
        }
      }
      fields {
        key: "meanAbsolutePercentageError"
        value {
          number_value: 25.082605
        }
      }
      fields {
        key: "rSquared"
        value {
          number_value: 0.88434035
        }
      }
      fields {
        key: "rootMeanSquaredError"
        value {
          number_value: 47.997845
        }
      }
      fields {
        key: "rootMeanSquaredLogError"
        value {
          number_value: nan
        }
      }
    }
  }
  create_time {
    seconds: 1630550819
    nanos: 842478000
  }
}
>```

This object does not have a model_explanation member and the code returns an error

Answer 1

通过 Vertex Explainable AI.

，特征属性包含在 Vertex AI 预测中

对于批量预测，您需要设置generate_explanation to True in your python BatchPredictionJob class like in this example from the documentation:
请注意，在 Cloud Storage 或预测模型中返回预测数据时，不支持特征重要性。

batch_prediction_job = {
        "display_name": display_name,
        # Format: 'projects/{project}/locations/{location}/models/{model_id}'
        "model": model_name,
        "model_parameters": model_parameters,
        "input_config": {
            "instances_format": instances_format,
            "bigquery_source": {"input_uri": bigquery_source_input_uri},
        },
        "output_config": {
            "predictions_format": predictions_format,
            "bigquery_destination": {"output_uri": bigquery_destination_output_uri},
        },
        # optional
        "generate_explanation": True,

Answer 2

Ricco D 在这里发布了一个带有代码的工作解决方案来回答这个问题

使用 Python 从经过训练的 Vertex AI 表格回归模型访问特征重要性

Access feature importance from trained Vertex AI Tabular regression model using Python

google-cloud-platform

google-cloud-automl

google-ai-platform