Docker容器nvidia/k8s-device-plugin:1.9一直报错
Docker Container nvidia/k8s-device-plugin:1.9 Keeps Reporting Error
我正在尝试在我的 ubuntu 18.04 LTS 服务器上设置一个小型 kubenertes 集群。现在每一步都完成了,但是检查 GPU 状态失败了。容器一直报错:
1.问题描述
我按 Quick-Start 完成了步骤,但是当我 运行 测试用例时,它报告错误。
2。重现问题的步骤
执行shell命令
docker run --security-opt=no-new-privileges --cap-drop=ALL
--network=none -it -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins
nvidia/k8s-device-plugin:1.9
检查错误
2020/02/09 00:20:15 Starting to serve on
/var/lib/kubelet/device-plugins/nvidia.sock
2020/02/09 00:20:15 Could not register device plugin: rpc error: code = Unimplemented desc =
unknown service deviceplugin.Registration
2020/02/09 00:20:15 Could
not contact Kubelet, retrying. Did you enable the device plugin
feature gate?
2020/02/09 00:20:15 You can check the prerequisites at:
https://github.com/NVIDIA/k8s-device-plugin#prerequisites
2020/02/09
00:20:15 You can learn how to set the runtime at:
https://github.com/NVIDIA/k8s-device-plugin#quick-start
3。环境信息
- nvidia-docker 运行 --rm dlws/cuda nvidia-smi
的输出
NVIDIA-SMI 440.48.02 Driver Version: 440.48.02 CUDA Version: 10.2
- nvidia-docker运行--rmdlws/cudanvidia-smi
的输出
NVIDIA-SMI 440.48.02 Driver Version: 440.48.02 CUDA Version: 10.2
- /etc/docker/daemon.json
的内容
内容:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
- docker版本:19.03.2
- kubernetes 版本:1.15.2
终于找到答案了,希望post对遇到同样问题的其他人有所帮助:
对于 kubernetes 1.15,请改用 k8s-device-plugin:1.11。 1.9 版本无法与 kubelet 通信。
我正在尝试在我的 ubuntu 18.04 LTS 服务器上设置一个小型 kubenertes 集群。现在每一步都完成了,但是检查 GPU 状态失败了。容器一直报错:
1.问题描述
我按 Quick-Start 完成了步骤,但是当我 运行 测试用例时,它报告错误。
2。重现问题的步骤
执行shell命令
docker run --security-opt=no-new-privileges --cap-drop=ALL --network=none -it -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins nvidia/k8s-device-plugin:1.9
检查错误
2020/02/09 00:20:15 Starting to serve on /var/lib/kubelet/device-plugins/nvidia.sock
2020/02/09 00:20:15 Could not register device plugin: rpc error: code = Unimplemented desc = unknown service deviceplugin.Registration
2020/02/09 00:20:15 Could not contact Kubelet, retrying. Did you enable the device plugin feature gate?
2020/02/09 00:20:15 You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
2020/02/09 00:20:15 You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start
3。环境信息
- nvidia-docker 运行 --rm dlws/cuda nvidia-smi
NVIDIA-SMI 440.48.02 Driver Version: 440.48.02 CUDA Version: 10.2
- nvidia-docker运行--rmdlws/cudanvidia-smi 的输出
NVIDIA-SMI 440.48.02 Driver Version: 440.48.02 CUDA Version: 10.2
- /etc/docker/daemon.json 的内容
内容:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
- docker版本:19.03.2
- kubernetes 版本:1.15.2
终于找到答案了,希望post对遇到同样问题的其他人有所帮助:
对于 kubernetes 1.15,请改用 k8s-device-plugin:1.11。 1.9 版本无法与 kubelet 通信。