使用 Python 从经过训练的 Vertex AI 表格回归模型访问特征重要性
Access feature importance from trained Vertex AI Tabular regression model using Python
我正在处理使用 GCP 中 Vertex 的 Tabular automl 训练的模型。
培训和批量预测工作正常。我正在尝试在可视化中使用特征重要性并尝试从 python 中获取它们。
我可以使用@Ricco D 为我发布的代码进行模型评估:
api_endpoint = 'us-central1-aiplatform.googleapis.com'
client_options = {"api_endpoint": api_endpoint} # api_endpoint is required for client_options
client_model = aiplatform.services.model_service.ModelServiceClient(client_options=client_options)
project_id = 't...1'
location = 'us-central1'
model_id = '6...2'
model_name = f'projects/{project_id}/locations/{location}/models/{model_id}'
list_eval_request = aiplatform.types.ListModelEvaluationsRequest(parent=model_name)
list_eval = client_model.list_model_evaluations(request=list_eval_request)
for val in list_eval:
print(val.model_explanation)
但我不知道如何获得训练管道中生成的训练模型的特征重要性。我可以在模型页面上看到它们,但无法从 python 访问它们:
返回的代码 ListModelEvaluationsPager 对象是这样的:
name: "projects/7...3/locations/us-central1/models/6...2/evaluations/5...0"
metrics_schema_uri: "gs://google-cloud-aiplatform/schema/modelevaluation/regression_metrics_1.0.0.yaml"
metrics {
struct_value {
fields {
key: "meanAbsoluteError"
value {
number_value: 27.391115
}
}
fields {
key: "meanAbsolutePercentageError"
value {
number_value: 25.082605
}
}
fields {
key: "rSquared"
value {
number_value: 0.88434035
}
}
fields {
key: "rootMeanSquaredError"
value {
number_value: 47.997845
}
}
fields {
key: "rootMeanSquaredLogError"
value {
number_value: nan
}
}
}
}
create_time {
seconds: 1630550819
nanos: 842478000
}
}
>```
This object does not have a model_explanation member and the code returns an error
,特征属性包含在 Vertex AI 预测中
对于批量预测,您需要设置generate_explanation to True
in your python BatchPredictionJob class like in this example from the documentation:
请注意,在 Cloud Storage 或预测模型中返回预测数据时,不支持特征重要性。
batch_prediction_job = {
"display_name": display_name,
# Format: 'projects/{project}/locations/{location}/models/{model_id}'
"model": model_name,
"model_parameters": model_parameters,
"input_config": {
"instances_format": instances_format,
"bigquery_source": {"input_uri": bigquery_source_input_uri},
},
"output_config": {
"predictions_format": predictions_format,
"bigquery_destination": {"output_uri": bigquery_destination_output_uri},
},
# optional
"generate_explanation": True,
Ricco D 在这里发布了一个带有代码的工作解决方案来回答这个问题
我正在处理使用 GCP 中 Vertex 的 Tabular automl 训练的模型。 培训和批量预测工作正常。我正在尝试在可视化中使用特征重要性并尝试从 python 中获取它们。 我可以使用@Ricco D 为我发布的代码进行模型评估:
api_endpoint = 'us-central1-aiplatform.googleapis.com'
client_options = {"api_endpoint": api_endpoint} # api_endpoint is required for client_options
client_model = aiplatform.services.model_service.ModelServiceClient(client_options=client_options)
project_id = 't...1'
location = 'us-central1'
model_id = '6...2'
model_name = f'projects/{project_id}/locations/{location}/models/{model_id}'
list_eval_request = aiplatform.types.ListModelEvaluationsRequest(parent=model_name)
list_eval = client_model.list_model_evaluations(request=list_eval_request)
for val in list_eval:
print(val.model_explanation)
但我不知道如何获得训练管道中生成的训练模型的特征重要性。我可以在模型页面上看到它们,但无法从 python 访问它们:
返回的代码 ListModelEvaluationsPager 对象是这样的:
name: "projects/7...3/locations/us-central1/models/6...2/evaluations/5...0"
metrics_schema_uri: "gs://google-cloud-aiplatform/schema/modelevaluation/regression_metrics_1.0.0.yaml"
metrics {
struct_value {
fields {
key: "meanAbsoluteError"
value {
number_value: 27.391115
}
}
fields {
key: "meanAbsolutePercentageError"
value {
number_value: 25.082605
}
}
fields {
key: "rSquared"
value {
number_value: 0.88434035
}
}
fields {
key: "rootMeanSquaredError"
value {
number_value: 47.997845
}
}
fields {
key: "rootMeanSquaredLogError"
value {
number_value: nan
}
}
}
}
create_time {
seconds: 1630550819
nanos: 842478000
}
}
>```
This object does not have a model_explanation member and the code returns an error
对于批量预测,您需要设置generate_explanation to True
in your python BatchPredictionJob class like in this example from the documentation:
请注意,在 Cloud Storage 或预测模型中返回预测数据时,不支持特征重要性。
batch_prediction_job = {
"display_name": display_name,
# Format: 'projects/{project}/locations/{location}/models/{model_id}'
"model": model_name,
"model_parameters": model_parameters,
"input_config": {
"instances_format": instances_format,
"bigquery_source": {"input_uri": bigquery_source_input_uri},
},
"output_config": {
"predictions_format": predictions_format,
"bigquery_destination": {"output_uri": bigquery_destination_output_uri},
},
# optional
"generate_explanation": True,
Ricco D 在这里发布了一个带有代码的工作解决方案来回答这个问题