400 用于部署的无效映像 "us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-11:latest"。请使用具有有效图像的模型
400 Invalid image "us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-11:latest" for deployment. Please use a Model with a valid image
我在这里遗漏的重点是什么。所有数据集和训练都是在 gcp 上完成的,我的设置是基本的。
- 在 JupyterLab 上完成训练、验证和测试
- 模型推送到 gcp 存储桶
- 创建和端点
- 将模型部署到端点。
所有步骤看起来都很好,直到最后 (4)。尝试了 google 推荐的其他预构建 pytorch 图像,但错误仍然存在。错误长如下图:
---------------------------------------------------------------------------
_InactiveRpcError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
65 try:
---> 66 return callable_(*args, **kwargs)
67 except grpc.RpcError as exc:
/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
945 wait_for_ready, compression)
--> 946 return _end_unary_response_blocking(state, call, False, None)
947
/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
848 else:
--> 849 raise _InactiveRpcError(state)
850
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Invalid image "us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-11:latest" for deployment. Please use a Model with a valid image."
debug_error_string = "{"created":"@1652032269.328842405","description":"Error received from peer ipv4:142.250.148.95:443","file":"src/core/lib/surface/call.cc","file_line":903,"grpc_message":"Invalid image "us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-11:latest" for deployment. Please use a Model with a valid image.","grpc_status":3}"
>
The above exception was the direct cause of the following exception:
InvalidArgument Traceback (most recent call last)
/tmp/ipykernel_2924/2180059764.py in <module>
5 machine_type = DEPLOY_COMPUTE,
6 min_replica_count = 1,
----> 7 max_replica_count = 1
8 )
/opt/conda/lib/python3.7/site-packages/google/cloud/aiplatform/models.py in deploy(self, model, deployed_model_display_name, traffic_percentage, traffic_split, machine_type, min_replica_count, max_replica_count, accelerator_type, accelerator_count, service_account, explanation_metadata, explanation_parameters, metadata, sync)
697 explanation_parameters=explanation_parameters,
698 metadata=metadata,
--> 699 sync=sync,
700 )
701
/opt/conda/lib/python3.7/site-packages/google/cloud/aiplatform/base.py in wrapper(*args, **kwargs)
728 if self:
729 VertexAiResourceNounWithFutureManager.wait(self)
--> 730 return method(*args, **kwargs)
731
732 # callbacks to call within the Future (in same Thread)
/opt/conda/lib/python3.7/site-packages/google/cloud/aiplatform/models.py in _deploy(self, model, deployed_model_display_name, traffic_percentage, traffic_split, machine_type, min_replica_count, max_replica_count, accelerator_type, accelerator_count, service_account, explanation_metadata, explanation_parameters, metadata, sync)
812 explanation_metadata=explanation_metadata,
813 explanation_parameters=explanation_parameters,
--> 814 metadata=metadata,
815 )
816
/opt/conda/lib/python3.7/site-packages/google/cloud/aiplatform/models.py in _deploy_call(cls, api_client, endpoint_resource_name, model_resource_name, endpoint_resource_traffic_split, deployed_model_display_name, traffic_percentage, traffic_split, machine_type, min_replica_count, max_replica_count, accelerator_type, accelerator_count, service_account, explanation_metadata, explanation_parameters, metadata)
979 deployed_model=deployed_model,
980 traffic_split=traffic_split,
--> 981 metadata=metadata,
982 )
983
/opt/conda/lib/python3.7/site-packages/google/cloud/aiplatform_v1/services/endpoint_service/client.py in deploy_model(self, request, endpoint, deployed_model, traffic_split, retry, timeout, metadata)
1155
1156 # Send the request.
-> 1157 response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
1158
1159 # Wrap the response in an operation future.
/opt/conda/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py in __call__(self, timeout, retry, *args, **kwargs)
152 kwargs["metadata"] = metadata
153
--> 154 return wrapped_func(*args, **kwargs)
155
156
/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
66 return callable_(*args, **kwargs)
67 except grpc.RpcError as exc:
---> 68 raise exceptions.from_grpc_error(exc) from exc
69
70 return error_remapped_callable
InvalidArgument: 400 Invalid image "us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-11:latest" for deployment. Please use a Model with a valid image.
下面是我创建模型、端点和部署到端点的一些细节。
DEPLOY_COMPUTE = 'n1-standard-4'
DEPLOY_IMAGE='us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-11:latest'
model = aip.Model.upload(
display_name = f'{NOTEBOOK}_{TIMESTAMP}',
serving_container_image_uri = DEPLOY_IMAGE,
artifact_uri = URI,
labels = {'notebook':f'{NOTEBOOK}'}
)
endpoint = aip.Endpoint.create(
display_name = f'{NOTEBOOK}_{TIMESTAMP}',
labels = {'notebook':f'{NOTEBOOK}'}
)
endpoint.deploy(
model = model,
deployed_model_display_name = f'{NOTEBOOK}_{TIMESTAMP}',
traffic_percentage = 100,
machine_type = DEPLOY_COMPUTE,
min_replica_count = 1,
max_replica_count = 1
)
您正在使用 us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-11:latest
容器映像 导入模型。但是,在 pytorch
中训练的模型在导入模型时不能使用 pre-built 容器,因为如本 documentation、
中所述
You can use a pre-built container if your model meets the following
requirements:
- Trained in Python 3.7 or later
- Trained using TensorFlow, scikit-learn, or XGBoost
- Exported to meet framework-specific requirements for one of the pre-built prediction containers
我建议您使用 2 个解决方法:
您可以参考此 documentation 为您的 pytorch
训练模型创建自定义预测容器图像。
或
Re-train您的机型满足以上要求,才能使用pre-bult容器
我在这里遗漏的重点是什么。所有数据集和训练都是在 gcp 上完成的,我的设置是基本的。
- 在 JupyterLab 上完成训练、验证和测试
- 模型推送到 gcp 存储桶
- 创建和端点
- 将模型部署到端点。
所有步骤看起来都很好,直到最后 (4)。尝试了 google 推荐的其他预构建 pytorch 图像,但错误仍然存在。错误长如下图:
---------------------------------------------------------------------------
_InactiveRpcError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
65 try:
---> 66 return callable_(*args, **kwargs)
67 except grpc.RpcError as exc:
/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in __call__(self, request, timeout, metadata, credentials, wait_for_ready, compression)
945 wait_for_ready, compression)
--> 946 return _end_unary_response_blocking(state, call, False, None)
947
/opt/conda/lib/python3.7/site-packages/grpc/_channel.py in _end_unary_response_blocking(state, call, with_call, deadline)
848 else:
--> 849 raise _InactiveRpcError(state)
850
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Invalid image "us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-11:latest" for deployment. Please use a Model with a valid image."
debug_error_string = "{"created":"@1652032269.328842405","description":"Error received from peer ipv4:142.250.148.95:443","file":"src/core/lib/surface/call.cc","file_line":903,"grpc_message":"Invalid image "us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-11:latest" for deployment. Please use a Model with a valid image.","grpc_status":3}"
>
The above exception was the direct cause of the following exception:
InvalidArgument Traceback (most recent call last)
/tmp/ipykernel_2924/2180059764.py in <module>
5 machine_type = DEPLOY_COMPUTE,
6 min_replica_count = 1,
----> 7 max_replica_count = 1
8 )
/opt/conda/lib/python3.7/site-packages/google/cloud/aiplatform/models.py in deploy(self, model, deployed_model_display_name, traffic_percentage, traffic_split, machine_type, min_replica_count, max_replica_count, accelerator_type, accelerator_count, service_account, explanation_metadata, explanation_parameters, metadata, sync)
697 explanation_parameters=explanation_parameters,
698 metadata=metadata,
--> 699 sync=sync,
700 )
701
/opt/conda/lib/python3.7/site-packages/google/cloud/aiplatform/base.py in wrapper(*args, **kwargs)
728 if self:
729 VertexAiResourceNounWithFutureManager.wait(self)
--> 730 return method(*args, **kwargs)
731
732 # callbacks to call within the Future (in same Thread)
/opt/conda/lib/python3.7/site-packages/google/cloud/aiplatform/models.py in _deploy(self, model, deployed_model_display_name, traffic_percentage, traffic_split, machine_type, min_replica_count, max_replica_count, accelerator_type, accelerator_count, service_account, explanation_metadata, explanation_parameters, metadata, sync)
812 explanation_metadata=explanation_metadata,
813 explanation_parameters=explanation_parameters,
--> 814 metadata=metadata,
815 )
816
/opt/conda/lib/python3.7/site-packages/google/cloud/aiplatform/models.py in _deploy_call(cls, api_client, endpoint_resource_name, model_resource_name, endpoint_resource_traffic_split, deployed_model_display_name, traffic_percentage, traffic_split, machine_type, min_replica_count, max_replica_count, accelerator_type, accelerator_count, service_account, explanation_metadata, explanation_parameters, metadata)
979 deployed_model=deployed_model,
980 traffic_split=traffic_split,
--> 981 metadata=metadata,
982 )
983
/opt/conda/lib/python3.7/site-packages/google/cloud/aiplatform_v1/services/endpoint_service/client.py in deploy_model(self, request, endpoint, deployed_model, traffic_split, retry, timeout, metadata)
1155
1156 # Send the request.
-> 1157 response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
1158
1159 # Wrap the response in an operation future.
/opt/conda/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py in __call__(self, timeout, retry, *args, **kwargs)
152 kwargs["metadata"] = metadata
153
--> 154 return wrapped_func(*args, **kwargs)
155
156
/opt/conda/lib/python3.7/site-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
66 return callable_(*args, **kwargs)
67 except grpc.RpcError as exc:
---> 68 raise exceptions.from_grpc_error(exc) from exc
69
70 return error_remapped_callable
InvalidArgument: 400 Invalid image "us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-11:latest" for deployment. Please use a Model with a valid image.
下面是我创建模型、端点和部署到端点的一些细节。
DEPLOY_COMPUTE = 'n1-standard-4'
DEPLOY_IMAGE='us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-11:latest'
model = aip.Model.upload(
display_name = f'{NOTEBOOK}_{TIMESTAMP}',
serving_container_image_uri = DEPLOY_IMAGE,
artifact_uri = URI,
labels = {'notebook':f'{NOTEBOOK}'}
)
endpoint = aip.Endpoint.create(
display_name = f'{NOTEBOOK}_{TIMESTAMP}',
labels = {'notebook':f'{NOTEBOOK}'}
)
endpoint.deploy(
model = model,
deployed_model_display_name = f'{NOTEBOOK}_{TIMESTAMP}',
traffic_percentage = 100,
machine_type = DEPLOY_COMPUTE,
min_replica_count = 1,
max_replica_count = 1
)
您正在使用 us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-11:latest
容器映像 导入模型。但是,在 pytorch
中训练的模型在导入模型时不能使用 pre-built 容器,因为如本 documentation、
You can use a pre-built container if your model meets the following requirements:
- Trained in Python 3.7 or later
- Trained using TensorFlow, scikit-learn, or XGBoost
- Exported to meet framework-specific requirements for one of the pre-built prediction containers
我建议您使用 2 个解决方法:
您可以参考此 documentation 为您的
pytorch
训练模型创建自定义预测容器图像。或
Re-train您的机型满足以上要求,才能使用pre-bult容器