Vertex AI 管道失败的先决条件
Vertex AI Pipeline Failed Precondition
我一直在关注这个视频:
https://www.youtube.com/watch?v=1ykDWsnL2LE&t=310s
代码位于:
https://codelabs.developers.google.com/vertex-pipelines-intro#5
(我已经按照视频完成了最后两个步骤,这对于 google_cloud_pipeline_components 版本:0.1.1 来说不是问题)
我在顶点 ai 中创建了一个管道,其中 运行 并使用以下代码创建管道(来自视频而不是上面 link 中的代码提取):
#run pipeline
response = api_client.create_run_from_job_spec(
"tab_classif_pipeline.json", pipeline_root = PIPELINE_ROOT,
parameter_values = {
"project" : PROJECT_ID,
"display_name" : DISPLAY_NAME
}
)
并且在 GCP 日志中我收到以下错误:
"google.api_core.exceptions.FailedPrecondition: 400 BigQuery Dataset location `eu` must be in the same location as the service location `us-central1`.
我在 dataset_create_op 阶段得到错误:
dataset_create_op = gcc_aip.TabularDatasetCreateOp(
project = project, display_name = display_name, bq_source = bq_source
)
我的数据集是在欧盟(整个地区)配置的,所以我不明白 us-central1 来自哪里(或者服务位置是什么?)。
这是我用过的所有代码:
PROJECT_ID = "marketingtown"
BUCKET_NAME = f"gs://lookalike_model"
from typing import NamedTuple
import kfp
from kfp import dsl
from kfp.v2 import compiler
from kfp.v2.dsl import (Artifact, Input, InputPath, Model, Output,
OutputPath, ClassificationMetrics,
Metrics, component)
from kfp.v2.components.types.artifact_types import Dataset
from kfp.v2.google.client import AIPlatformClient
from google.cloud import aiplatform
from google_cloud_pipeline_components import aiplatform as gcc_aip
#set environment variables
PATH = %env PATH
%env PATH = (PATH)://home/jupyter/.local/bin
REGION = "europe-west2"
#cloud storage path where artifact is created by pipeline
PIPELINE_ROOT = f"{BUCKET_NAME}/pipeline_root/"
PIPELINE_ROOT
import time
DISPLAY_NAME = f"lookalike_model_pipeline_{str(int(time.time()))}"
print(DISPLAY_NAME)
@kfp.dsl.pipeline(name = "lookalike-model-training-v2",
pipeline_root = PIPELINE_ROOT)
def pipeline(
bq_source : str = f"bq://{PROJECT_ID}.MLOp_pipeline_temp.lookalike_training_set",
display_name : str = DISPLAY_NAME,
project : str = PROJECT_ID,
gcp_region : str = "europe-west2",
api_endpoint : str = "europe-west2-aiplatform.googleapis.com",
thresholds_dict_str : str = '{"auPrc" : 0.3}'
):
dataset_create_op = gcc_aip.TabularDatasetCreateOp(
project = project, display_name = display_name, bq_source = bq_source
)
training_op = gcc_aip.AutoMLTabularTrainingJobRunOp(
project=project,
display_name=display_name,
optimization_prediction_type="classification",
budget_milli_node_hours=1000,
column_transformations=[
{"categorical": {"column_name": "agentId"}},
{"categorical": {"column_name": "postcode"}},
{"categorical": {"column_name": "isMobile"}},
{"categorical": {"column_name": "gender"}},
{"categorical": {"column_name": "timeOfDay"}},
{"categorical": {"column_name": "sale"}},
],
dataset=dataset_create_op.outputs["dataset"], #dataset from previous step
target_column="sale",
)
#outputted evaluation metrics
model_eval_task = classification_model_eval_metrics(
project,
gcp_region,
api_endpoint,
thresholds_dict_str,
training_op.outputs["model"],
)
#if deployment threshold is mean, deploy
with dsl.Condition(
model_eval_task.outputs["dep_decision"] == "true",
name="deploy_decision",
):
endpoint_op = gcc_aip.EndpointCreateOp(
project=project,
location=gcp_region,
display_name="train-automl-beans",
)
#deploys model to an endpoint
gcc_aip.ModelDeployOp(
model=training_op.outputs["model"],
endpoint=endpoint_op.outputs["endpoint"],
min_replica_count=1,
max_replica_count=1,
machine_type="n1-standard-4",
)
compiler.Compiler().compile(
pipeline_func = pipeline, package_path = "tab_classif_pipeline.json"
)
#run pipeline
response = api_client.create_run_from_job_spec(
"tab_classif_pipeline.json", pipeline_root = PIPELINE_ROOT,
parameter_values = {
"project" : PROJECT_ID,
"display_name" : DISPLAY_NAME
}
)
正如@scottlucas 证实的那样,这个问题已通过升级到最新版本的 google-cloud-aiplatform 解决,可以通过 pip install --upgrade google-cloud-aiplatform
.
完成
升级到最新的库可确保所有可供参考的官方文档与实际产品保持一致。
将答案发布为社区 wiki 以造福于将来可能会遇到此用例的社区。
随时编辑此答案以获取更多信息。
我通过将位置添加到 TabularDatasetCreateJob 解决了这个问题:
dataset_create_op = gcc_aip.TabularDatasetCreateOp(
project=project,
display_name=display_name,
bq_source=bq_source,
location = gcp_region
)
我现在在模型训练作业中遇到了同样的问题,但我了解到上面代码中的很多函数都带有一个位置参数,或者默认为 us-central1。如果我有进一步的了解,我会更新。
我一直在关注这个视频: https://www.youtube.com/watch?v=1ykDWsnL2LE&t=310s
代码位于: https://codelabs.developers.google.com/vertex-pipelines-intro#5 (我已经按照视频完成了最后两个步骤,这对于 google_cloud_pipeline_components 版本:0.1.1 来说不是问题)
我在顶点 ai 中创建了一个管道,其中 运行 并使用以下代码创建管道(来自视频而不是上面 link 中的代码提取):
#run pipeline
response = api_client.create_run_from_job_spec(
"tab_classif_pipeline.json", pipeline_root = PIPELINE_ROOT,
parameter_values = {
"project" : PROJECT_ID,
"display_name" : DISPLAY_NAME
}
)
并且在 GCP 日志中我收到以下错误:
"google.api_core.exceptions.FailedPrecondition: 400 BigQuery Dataset location `eu` must be in the same location as the service location `us-central1`.
我在 dataset_create_op 阶段得到错误:
dataset_create_op = gcc_aip.TabularDatasetCreateOp(
project = project, display_name = display_name, bq_source = bq_source
)
我的数据集是在欧盟(整个地区)配置的,所以我不明白 us-central1 来自哪里(或者服务位置是什么?)。
这是我用过的所有代码:
PROJECT_ID = "marketingtown"
BUCKET_NAME = f"gs://lookalike_model"
from typing import NamedTuple
import kfp
from kfp import dsl
from kfp.v2 import compiler
from kfp.v2.dsl import (Artifact, Input, InputPath, Model, Output,
OutputPath, ClassificationMetrics,
Metrics, component)
from kfp.v2.components.types.artifact_types import Dataset
from kfp.v2.google.client import AIPlatformClient
from google.cloud import aiplatform
from google_cloud_pipeline_components import aiplatform as gcc_aip
#set environment variables
PATH = %env PATH
%env PATH = (PATH)://home/jupyter/.local/bin
REGION = "europe-west2"
#cloud storage path where artifact is created by pipeline
PIPELINE_ROOT = f"{BUCKET_NAME}/pipeline_root/"
PIPELINE_ROOT
import time
DISPLAY_NAME = f"lookalike_model_pipeline_{str(int(time.time()))}"
print(DISPLAY_NAME)
@kfp.dsl.pipeline(name = "lookalike-model-training-v2",
pipeline_root = PIPELINE_ROOT)
def pipeline(
bq_source : str = f"bq://{PROJECT_ID}.MLOp_pipeline_temp.lookalike_training_set",
display_name : str = DISPLAY_NAME,
project : str = PROJECT_ID,
gcp_region : str = "europe-west2",
api_endpoint : str = "europe-west2-aiplatform.googleapis.com",
thresholds_dict_str : str = '{"auPrc" : 0.3}'
):
dataset_create_op = gcc_aip.TabularDatasetCreateOp(
project = project, display_name = display_name, bq_source = bq_source
)
training_op = gcc_aip.AutoMLTabularTrainingJobRunOp(
project=project,
display_name=display_name,
optimization_prediction_type="classification",
budget_milli_node_hours=1000,
column_transformations=[
{"categorical": {"column_name": "agentId"}},
{"categorical": {"column_name": "postcode"}},
{"categorical": {"column_name": "isMobile"}},
{"categorical": {"column_name": "gender"}},
{"categorical": {"column_name": "timeOfDay"}},
{"categorical": {"column_name": "sale"}},
],
dataset=dataset_create_op.outputs["dataset"], #dataset from previous step
target_column="sale",
)
#outputted evaluation metrics
model_eval_task = classification_model_eval_metrics(
project,
gcp_region,
api_endpoint,
thresholds_dict_str,
training_op.outputs["model"],
)
#if deployment threshold is mean, deploy
with dsl.Condition(
model_eval_task.outputs["dep_decision"] == "true",
name="deploy_decision",
):
endpoint_op = gcc_aip.EndpointCreateOp(
project=project,
location=gcp_region,
display_name="train-automl-beans",
)
#deploys model to an endpoint
gcc_aip.ModelDeployOp(
model=training_op.outputs["model"],
endpoint=endpoint_op.outputs["endpoint"],
min_replica_count=1,
max_replica_count=1,
machine_type="n1-standard-4",
)
compiler.Compiler().compile(
pipeline_func = pipeline, package_path = "tab_classif_pipeline.json"
)
#run pipeline
response = api_client.create_run_from_job_spec(
"tab_classif_pipeline.json", pipeline_root = PIPELINE_ROOT,
parameter_values = {
"project" : PROJECT_ID,
"display_name" : DISPLAY_NAME
}
)
正如@scottlucas 证实的那样,这个问题已通过升级到最新版本的 google-cloud-aiplatform 解决,可以通过 pip install --upgrade google-cloud-aiplatform
.
升级到最新的库可确保所有可供参考的官方文档与实际产品保持一致。
将答案发布为社区 wiki 以造福于将来可能会遇到此用例的社区。
随时编辑此答案以获取更多信息。
我通过将位置添加到 TabularDatasetCreateJob 解决了这个问题:
dataset_create_op = gcc_aip.TabularDatasetCreateOp(
project=project,
display_name=display_name,
bq_source=bq_source,
location = gcp_region
)
我现在在模型训练作业中遇到了同样的问题,但我了解到上面代码中的很多函数都带有一个位置参数,或者默认为 us-central1。如果我有进一步的了解,我会更新。