来自 Dockerfile 的 TensorFlow 服务错误 运行

Error Running TensorFlow Serving from Dockerfile

我正在为 运行 TensorFlow Serving 开发一个容器。

这是我的 Dockerfile:

FROM tensorflow/serving:latest

WORKDIR /

COPY models.config /models/models.config
COPY models/mnist /models/mnist

这是我的 models.config 简单 mnist 模型:

model_config_list {
    config: {
        name: "mnist",
        base_path: "/models/mnist"
        model_platform: "tensorflow"
        model_version_policy {
            specific {
                versions: 1646266834
            }
        }
        version_labels {
            key: 'stable'
            value: 1646266834
        }
    }
}

models目录设置如下:

$ ls -Rls models
total 0
0 drwxr-xr-x  3 david  staff  96 Mar  2 16:21 mnist

models/mnist:
total 0
0 drwxr-xr-x  6 david  staff  192 Mar  2 16:21 1646266834

models/mnist/1646266834:
total 304
  0 drwxr-xr-x  2 david  staff      64 Mar  2 16:21 assets
 32 -rw-r--r--  1 david  staff   15873 Mar  2 16:20 keras_metadata.pb
272 -rw-r--r--  1 david  staff  138167 Mar  2 16:20 saved_model.pb
  0 drwxr-xr-x  4 david  staff     128 Mar  2 16:21 variables

models/mnist/1646266834/assets:
total 0

models/mnist/1646266834/variables:
total 1424
1416 -rw-r--r--  1 david  staff  722959 Mar  2 16:20 variables.data-00000-of-00001
   8 -rw-r--r--  1 david  staff    2262 Mar  2 16:20 variables.index

问题是当我构建和 运行 我的容器时,我收到一个错误。

$ docker build -t example.com/example-tf-serving:1.0 .
$ docker run -it -p 8500:8500 -p 8501:8501 --name example-tf-serving --rm example.com/example-tf-serving:1.0

错误如下Not found: /models/model:

2022-03-03 00:48:06.242923: I tensorflow_serving/model_servers/server.cc:89] Building single TensorFlow model file config:  model_name: model model_base_path: /models/model
2022-03-03 00:48:06.243215: I tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models.
2022-03-03 00:48:06.243254: I tensorflow_serving/model_servers/server_core.cc:591]  (Re-)adding model: model
2022-03-03 00:48:06.243899: E tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:365] FileSystemStoragePathSource encountered a filesystem access error: Could not find base path /models/model for servable model with error Not found: /models/model not found

如何修复我的 Dockerfile 以便上面的命令起作用?

对于这个解决方案,快速简便的方法对我不起作用,所以我不能接受它作为解决方案:

docker run --name=the_name -p 9000:9000 -it -v "/path_to_the_model_in_computer:/path_to_model_in_docker" tensorflow/serving:1.15.0 --model_name=MODEL_NAME --port=9000

https://www.tensorflow.org/tfx/serving/docker

Optional environment variable MODEL_NAME (defaults to model)

Optional environment variable MODEL_BASE_PATH (defaults to /models)

您正在使用这些环境变量的默认值,因此 Tensorflow 正在尝试在 /models/model 中查找模型。您在容器中有不同的模型路径,因此 /models/model not found 是正确的。

我会说 MODEL_NAME env 变量的简单配置应该可以解决问题:

$ docker run -it -p 8500:8500 -p 8501:8501 \
  --name example-tf-serving \
  -e MODEL_NAME=mnist \
  --rm example.com/example-tf-serving:1.0

对于multiple models https://www.tensorflow.org/tfx/serving/serving_config#model_server_configuration

The easiest way to serve a model is to provide the --model_name and --model_base_path flags (or setting the MODEL_NAME environment variable if using Docker). However, if you would like to serve multiple models, or configure options like polling frequency for new versions, you may do so by writing a Model Server config file.

You may provide this configuration file using the --model_config_file flag and instruct Tensorflow Serving to periodically poll for updated versions of this configuration file at the specifed path by setting the --model_config_file_poll_wait_seconds flag.

请参阅 docker 文档:https://www.tensorflow.org/tfx/serving/docker#passing_additional_arguments

你需要在Dockerfile中设置CMD(所以你不需要在运行的时候指定它,因为要求只使用Dockerfile),例如:

FROM tensorflow/serving:latest

WORKDIR /

COPY models.config /models/models.config
COPY models/mnist /models/mnist

CMD ["--model_config_file=/models/models.config"]