Kfserving -- 定义 storageUri 时出错
Kfserving -- Error When Defining storageUri
我正在尝试使用 Kfserving 部署一个非常基本的 Sklearn 模型,这是 yaml 文件:
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
name: "sklearn-iris"
spec:
default:
predictor:
sklearn:
storageUri: file://./storage_dir
请注意,由于我们的公司环境无法访问 Google 云存储,现在我只使用我的本地文件夹之一作为 storageUri
,并且我有 model.joblib
存放在文件夹中。
使用 kubectl apply -f sklearn.yaml -n kfserving-test
部署后,检查 kubectl describe revision sklearn-iris-predictor-default-fj5qt -n kfserving-test
时出现以下错误:
Status:
Conditions:
Last Transition Time: 2020-12-16T22:51:38Z
Message: The target is not receiving traffic.
Reason: NoTraffic
Severity: Info
Status: False
Type: Active
Last Transition Time: 2020-12-16T22:51:37Z
Message: Container failed with: [I 201216 22:50:07 storage:35] Copying contents of /mnt/models to local
Traceback (most recent call last):
File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/sklearnserver/sklearnserver/__main__.py", line 33, in <module>
model.load()
File "/sklearnserver/sklearnserver/model.py", line 36, in load
model_file = next(path for path in paths if os.path.exists(path))
StopIteration
Reason: ExitCode1
Status: False
Type: ContainerHealthy
Last Transition Time: 2020-12-16T22:51:38Z
Message: Initial scale was never achieved
Reason: ProgressDeadlineExceeded
Status: False
Type: Ready
Last Transition Time: 2020-12-16T22:51:38Z
Message: Initial scale was never achieved
Reason: ProgressDeadlineExceeded
Status: False
Type: ResourcesAvailable
Container Statuses:
Image Digest: gcr.docker.prod.walmart.com/kfserving/sklearnserver@sha256:d2553d3f2a6ba7b50736028e6dbdfb35e90ca40ee7aa5cbe0e0b66fec1695f16
Name: kfserving-container
Image Digest: gcr.docker.prod.walmart.com/kfserving/sklearnserver@sha256:d2553d3f2a6ba7b50736028e6dbdfb35e90ca40ee7aa5cbe0e0b66fec1695f16
Log URL: http://localhost:8001/api/v1/namespaces/knative-monitoring/services/kibana-logging/proxy/app/kibana#/discover?_a=(query:(match:(kubernetes.labels.knative-dev%2FrevisionUID:(query:'e6fee737-b9b8-4091-96a5-660dbf4082f8',type:phrase))))
Observed Generation: 1
Service Name: sklearn-iris-predictor-default-fj5qt
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning InternalError 2m16s revision-controller failed to update deployment "sklearn-iris-predictor-default-fj5qt-deployment": Operation cannot be fulfilled on deployments.apps "sklearn-iris-predictor-default-fj5qt-deployment": the object has been modified; please apply your changes to the latest version and try again
异常似乎无法 load/transfer 模型文件,我想知道我对 storageUri
参数做错了什么。应该是模型文件的相对路径吧? (参考:https://github.com/kubeflow/kfserving/blob/master/python/kfserving/docs/V1alpha2SKLearnSpec.md)
KFServing 正在预测器的 pod 中注入第二个容器,在您的例子中为 SKLearn,称为 storage_initializer
。它的作用是将模型文件从 storageUri
下载并复制到 pod 中的某个位置,以从此类任务中卸载预测器。
在构建 KFServing 时,使用 storageUri
中的 file://
可以方便地进行测试,但它需要 pod 已经在本地安装了文件。
如果您无法访问 gs://
和 s3://
等基于云的存储,您可以使用其中一种替代解决方案,例如 uri://
或 pvc://
,从本地 kubernetes 集群提供模型文件。你可以找到 examples here.
我正在尝试使用 Kfserving 部署一个非常基本的 Sklearn 模型,这是 yaml 文件:
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
name: "sklearn-iris"
spec:
default:
predictor:
sklearn:
storageUri: file://./storage_dir
请注意,由于我们的公司环境无法访问 Google 云存储,现在我只使用我的本地文件夹之一作为 storageUri
,并且我有 model.joblib
存放在文件夹中。
使用 kubectl apply -f sklearn.yaml -n kfserving-test
部署后,检查 kubectl describe revision sklearn-iris-predictor-default-fj5qt -n kfserving-test
时出现以下错误:
Status:
Conditions:
Last Transition Time: 2020-12-16T22:51:38Z
Message: The target is not receiving traffic.
Reason: NoTraffic
Severity: Info
Status: False
Type: Active
Last Transition Time: 2020-12-16T22:51:37Z
Message: Container failed with: [I 201216 22:50:07 storage:35] Copying contents of /mnt/models to local
Traceback (most recent call last):
File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/sklearnserver/sklearnserver/__main__.py", line 33, in <module>
model.load()
File "/sklearnserver/sklearnserver/model.py", line 36, in load
model_file = next(path for path in paths if os.path.exists(path))
StopIteration
Reason: ExitCode1
Status: False
Type: ContainerHealthy
Last Transition Time: 2020-12-16T22:51:38Z
Message: Initial scale was never achieved
Reason: ProgressDeadlineExceeded
Status: False
Type: Ready
Last Transition Time: 2020-12-16T22:51:38Z
Message: Initial scale was never achieved
Reason: ProgressDeadlineExceeded
Status: False
Type: ResourcesAvailable
Container Statuses:
Image Digest: gcr.docker.prod.walmart.com/kfserving/sklearnserver@sha256:d2553d3f2a6ba7b50736028e6dbdfb35e90ca40ee7aa5cbe0e0b66fec1695f16
Name: kfserving-container
Image Digest: gcr.docker.prod.walmart.com/kfserving/sklearnserver@sha256:d2553d3f2a6ba7b50736028e6dbdfb35e90ca40ee7aa5cbe0e0b66fec1695f16
Log URL: http://localhost:8001/api/v1/namespaces/knative-monitoring/services/kibana-logging/proxy/app/kibana#/discover?_a=(query:(match:(kubernetes.labels.knative-dev%2FrevisionUID:(query:'e6fee737-b9b8-4091-96a5-660dbf4082f8',type:phrase))))
Observed Generation: 1
Service Name: sklearn-iris-predictor-default-fj5qt
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning InternalError 2m16s revision-controller failed to update deployment "sklearn-iris-predictor-default-fj5qt-deployment": Operation cannot be fulfilled on deployments.apps "sklearn-iris-predictor-default-fj5qt-deployment": the object has been modified; please apply your changes to the latest version and try again
异常似乎无法 load/transfer 模型文件,我想知道我对 storageUri
参数做错了什么。应该是模型文件的相对路径吧? (参考:https://github.com/kubeflow/kfserving/blob/master/python/kfserving/docs/V1alpha2SKLearnSpec.md)
KFServing 正在预测器的 pod 中注入第二个容器,在您的例子中为 SKLearn,称为 storage_initializer
。它的作用是将模型文件从 storageUri
下载并复制到 pod 中的某个位置,以从此类任务中卸载预测器。
在构建 KFServing 时,使用 storageUri
中的 file://
可以方便地进行测试,但它需要 pod 已经在本地安装了文件。
如果您无法访问 gs://
和 s3://
等基于云的存储,您可以使用其中一种替代解决方案,例如 uri://
或 pvc://
,从本地 kubernetes 集群提供模型文件。你可以找到 examples here.