dask kubernetes aks (azure) 虚拟节点
dask kubernetes aks (azure) virtual nodes
使用下面的代码可以在 azure aks 中创建一个 dask kubernetes 集群。
它使用远程调度程序 (dask.config.set({"kubernetes.scheduler-service-type": "LoadBalancer"})
) 并且工作完美。
要使用虚拟节点,请取消注释行 extra_pod_config=virtual_config
(在 this official example 之后)。
它不起作用,出现以下错误:
ACI does not support providing args without specifying the command. Please supply both command and args to the pod spec.
这与传球有关containers: args: [dask-scheduler]
我应该提供哪个 containers: command:
来解决这个问题?
谢谢
import dask
from dask.distributed import Client
from dask_kubernetes import KubeCluster, KubeConfig, make_pod_spec
image = "daskdev/dask"
cluster = "aks-cluster1"
dask.config.set({"kubernetes.scheduler-service-type": "LoadBalancer"})
dask.config.set({"distributed.comm.timeouts.connect": 180})
virtual_config = {
"nodeSelector": {
"kubernetes.io/role": "agent",
"beta.kubernetes.io/os": "linux",
"type": "virtual-kubelet",
},
"tolerations": [
{"key": "virtual-kubelet.io/provider", "operator": "Exists"},
],
}
pod_spec = make_pod_spec(
image=image,
# extra_pod_config=virtual_config,
memory_limit="2G",
memory_request="2G",
cpu_limit=1,
cpu_request=1,
threads_per_worker=1, # same as cpu
)
# az aks get-credentials --name aks-cluster1 --resource-group resource_group1
# cp ~/.kube/config ./aksconfig.yaml
auth = KubeConfig(config_file="./aksconfig.yaml", context=cluster,)
cluster = KubeCluster(
pod_spec, auth=auth, deploy_mode="remote", scheduler_service_wait_timeout=180
)
client = Client(cluster)
原因来自virtual kubelet protection:在pod配置中,dask使用args
启动调度器或worker,但没有提供command
。
所以我明确地添加了入口点命令command_entrypoint_explicit
并且它起作用了:pods创建成功。
第二个问题:网络名称解析。工作人员无法通过网络名称连接到调度程序:tcp://{name}.{namespace}:{port}
虽然 tcp://{name}.{namespace}.svc.cluster.local:{port}
有效。我在 dask_kubernetes.core.Scheduler.start
中对其进行了编辑并且它有效。
另一种选择是 virtual_config
波纹管。下面的代码是一个完整的解决方案。
import dask
from dask.distributed import Client
from dask_kubernetes import KubeCluster, KubeConfig, make_pod_spec
dask.config.set({"kubernetes.scheduler-service-type": "LoadBalancer"})
dask.config.set({"distributed.comm.timeouts.connect": 180})
image = "daskdev/dask"
cluster = "aks-cluster-prod3"
virtual_config = {
"nodeSelector": {
"kubernetes.io/role": "agent",
"beta.kubernetes.io/os": "linux",
"type": "virtual-kubelet",
},
"tolerations": [
{"key": "virtual-kubelet.io/provider", "operator": "Exists"},
{"key": "azure.com/aci", "effect": "NoSchedule"},
],
"dnsConfig": {
"options": [{"name": "ndots", "value": "5"}],
"searches": [
"default.svc.cluster.local",
"svc.cluster.local",
"cluster.local",
],
},
}
# copied from: https://github.com/dask/dask-docker/blob/master/base/Dockerfile#L25
command_entrypoint_explicit = {
"command": ["tini", "-g", "--", "/usr/bin/prepare.sh"],
}
pod_spec = make_pod_spec(
image=image,
extra_pod_config=virtual_config,
extra_container_config=command_entrypoint_explicit,
memory_limit="2G",
memory_request="2G",
cpu_limit=1,
cpu_request=1,
threads_per_worker=1, # same as cpu
)
# az aks get-credentials --name aks-cluster1 --resource-group resource_group1
# cp ~/.kube/config ./aksconfig.yaml
auth = KubeConfig(config_file="./aksconfig.yaml", context=cluster,)
cluster = KubeCluster(
pod_spec,
auth=auth,
deploy_mode="remote",
scheduler_service_wait_timeout=180,
n_workers=1,
)
client = Client(cluster)
使用下面的代码可以在 azure aks 中创建一个 dask kubernetes 集群。
它使用远程调度程序 (dask.config.set({"kubernetes.scheduler-service-type": "LoadBalancer"})
) 并且工作完美。
要使用虚拟节点,请取消注释行 extra_pod_config=virtual_config
(在 this official example 之后)。
它不起作用,出现以下错误:
ACI does not support providing args without specifying the command. Please supply both command and args to the pod spec.
这与传球有关containers: args: [dask-scheduler]
我应该提供哪个 containers: command:
来解决这个问题?
谢谢
import dask
from dask.distributed import Client
from dask_kubernetes import KubeCluster, KubeConfig, make_pod_spec
image = "daskdev/dask"
cluster = "aks-cluster1"
dask.config.set({"kubernetes.scheduler-service-type": "LoadBalancer"})
dask.config.set({"distributed.comm.timeouts.connect": 180})
virtual_config = {
"nodeSelector": {
"kubernetes.io/role": "agent",
"beta.kubernetes.io/os": "linux",
"type": "virtual-kubelet",
},
"tolerations": [
{"key": "virtual-kubelet.io/provider", "operator": "Exists"},
],
}
pod_spec = make_pod_spec(
image=image,
# extra_pod_config=virtual_config,
memory_limit="2G",
memory_request="2G",
cpu_limit=1,
cpu_request=1,
threads_per_worker=1, # same as cpu
)
# az aks get-credentials --name aks-cluster1 --resource-group resource_group1
# cp ~/.kube/config ./aksconfig.yaml
auth = KubeConfig(config_file="./aksconfig.yaml", context=cluster,)
cluster = KubeCluster(
pod_spec, auth=auth, deploy_mode="remote", scheduler_service_wait_timeout=180
)
client = Client(cluster)
原因来自virtual kubelet protection:在pod配置中,dask使用args
启动调度器或worker,但没有提供command
。
所以我明确地添加了入口点命令command_entrypoint_explicit
并且它起作用了:pods创建成功。
第二个问题:网络名称解析。工作人员无法通过网络名称连接到调度程序:tcp://{name}.{namespace}:{port}
虽然 tcp://{name}.{namespace}.svc.cluster.local:{port}
有效。我在 dask_kubernetes.core.Scheduler.start
中对其进行了编辑并且它有效。
另一种选择是 virtual_config
波纹管。下面的代码是一个完整的解决方案。
import dask
from dask.distributed import Client
from dask_kubernetes import KubeCluster, KubeConfig, make_pod_spec
dask.config.set({"kubernetes.scheduler-service-type": "LoadBalancer"})
dask.config.set({"distributed.comm.timeouts.connect": 180})
image = "daskdev/dask"
cluster = "aks-cluster-prod3"
virtual_config = {
"nodeSelector": {
"kubernetes.io/role": "agent",
"beta.kubernetes.io/os": "linux",
"type": "virtual-kubelet",
},
"tolerations": [
{"key": "virtual-kubelet.io/provider", "operator": "Exists"},
{"key": "azure.com/aci", "effect": "NoSchedule"},
],
"dnsConfig": {
"options": [{"name": "ndots", "value": "5"}],
"searches": [
"default.svc.cluster.local",
"svc.cluster.local",
"cluster.local",
],
},
}
# copied from: https://github.com/dask/dask-docker/blob/master/base/Dockerfile#L25
command_entrypoint_explicit = {
"command": ["tini", "-g", "--", "/usr/bin/prepare.sh"],
}
pod_spec = make_pod_spec(
image=image,
extra_pod_config=virtual_config,
extra_container_config=command_entrypoint_explicit,
memory_limit="2G",
memory_request="2G",
cpu_limit=1,
cpu_request=1,
threads_per_worker=1, # same as cpu
)
# az aks get-credentials --name aks-cluster1 --resource-group resource_group1
# cp ~/.kube/config ./aksconfig.yaml
auth = KubeConfig(config_file="./aksconfig.yaml", context=cluster,)
cluster = KubeCluster(
pod_spec,
auth=auth,
deploy_mode="remote",
scheduler_service_wait_timeout=180,
n_workers=1,
)
client = Client(cluster)