当 运行 docker 时,导出的 VertexAI TabularModel model_warm_up 失败

Exported VertexAI TabularModel model_warm_up fails when running docker

晚上好,

我已按照此处的说明进行操作。

https://cloud.google.com/vertex-ai/docs/export/export-model-tabular

我在 Google 云平台控制台上训练了模型 然后按照说明导出模型。但是,当我 运行 docker run 命令时,我得到以下信息:

docker run -v `pwd`/model-1216534849343455232/tf-saved-model/model:/models/default -p 8080:8080 -it us-docker.pkg.dev/vertex-ai/automl-tabular/prediction-server-v1

INFO:root:running uCAIP model server
2022-04-12 02:07:09.118593: I tensorflow_serving/model_servers/server.cc:85] Building single TensorFlow model file config:  model_name: default model_base_path: /models/default/predict

2022-04-12 02:07:09.118695: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models.

2022-04-12 02:07:09.118703: I tensorflow_serving/model_servers/server_core.cc:573]  (Re-)adding model: default

2022-04-12 02:07:09.219134: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: default version: 1}

2022-04-12 02:07:09.219153: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: default version: 1}

2022-04-12 02:07:09.219159: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: default version: 1}

2022-04-12 02:07:09.219172: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/default/predict/001

2022-04-12 02:07:09.229531: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }

2022-04-12 02:07:09.241239: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA

2022-04-12 02:07:09.256079: E external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1575] OpKernel ('op: "DecodeProtoSparse" device_type: "CPU"') for unknown op: DecodeProtoSparse

2022-04-12 02:07:09.277522: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle.

2022-04-12 02:07:09.338428: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:151] Running initialization op on SavedModel bundle at path: /models/default/predict/001

2022-04-12 02:07:09.371063: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: success. Took 151887 microseconds.

2022-04-12 02:07:09.373646: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:117] Starting to read warmup data for model at /models/default/predict/001/assets.extra/tf_serving_warmup_requests with model-warmup-options 

2022-04-12 02:07:09.573843: F external/org_tensorflow/tensorflow/core/framework/tensor_shape.cc:44] Check failed: NDIMS == dims() (1 vs. 2)Asking for tensor of 1 dimensions from a tensor of 2 dimensions

2022-04-12 02:07:09.573843: F external/org_tensorflow/tensorflow/core/framework/tensor_shape.cc:44] Check failed: NDIMS == dims() (1 vs. 2)Asking for tensor of 1 dimensions from a tensor of 2 dimensions

Aborted (core dumped)

INFO:root:connecting to TF serving at localhost:9000
INFO:root:server listening on port 8080
INFO:root:connectivity went from None to ChannelConnectivity.IDLE
INFO:root:connectivity went from ChannelConnectivity.IDLE to ChannelConnectivity.CONNECTING
INFO:root:connectivity went from ChannelConnectivity.CONNECTING to ChannelConnectivity.TRANSIENT_FAILURE
INFO:root:connectivity went from ChannelConnectivity.TRANSIENT_FAILURE to ChannelConnectivity.CONNECTING
INFO:root:connectivity went from ChannelConnectivity.CONNECTING to ChannelConnectivity.TRANSIENT_FAILURE

我不确定我做错了什么,或者我需要更改什么来修复它。

提前感谢您的帮助。

更新:

environment.json 内容

{"container_uri": "us-docker.pkg.dev/vertex-ai/automl-tabular/prediction-server:20220331_1125_RC00", 
"tensorflow": "2.4.1", 
"struct2tensor": "0.29.0", 
"tensorflow-addons": "0.12.1", 
"tensorflow-text": "2.4.1"}

所以我遇到了同样的问题。即使重新训练以前有效的旧模型,现在也失败了。采纳 Shipra Sarkar 的评论建议,我尝试使用包,

europe-docker.pkg.dev/vertex-ai/automl-tabular/prediction-server:20210820_1325_RC00

使用这个旧包,我不再收到错误。

此问题是由于图像与模型的兼容性问题引起的。 prediction-server-v1:latest 始终向后兼容没有 environment.json 的现有模型,但它不向前兼容具有 environment.json 的新模型。要解决此问题,可以执行以下解决方法:

  • 如果模型工件包含 environment.json(新模型),请使用 us-docker.pkg.dev/vertex-ai/automl-tabular/prediction-server:20210820_1325_RC00 要么 europe-docker.pkg.dev/vertex-ai/automl-tabular/prediction-server:20210820_1325_RC00 conatiner_uri 中的图像 environment.json。
  • 如果没有environment.json,使用europe-docker.pkg.dev/vertex-ai/automl-tabular/prediction-server-v1:latest此图片向后兼容所有没有environment.json的机型。