Dask:如何向 Dask 集群添加安全性 (TLS/SSL)?
Dask: How to Add Security (TLS/SSL) to Dask Cluster?
我正在尝试弄清楚如何向使用 GCP 上的 GKE 上的 helm 部署的 Dask 集群添加安全层,这将强制用户将证书和密钥文件输入到安全对象中,如中所述本文档 [1]。不幸的是,我从调度程序 pod 崩溃中收到超时错误。经调查日志,错误如下:
Traceback (most recent call last):
File "/opt/conda/bin/dask-scheduler", line 10, in <module>
sys.exit(go())
File "/opt/conda/lib/python3.7/site-packages/distributed/cli/dask_scheduler.py", line 226, in go
main()
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/distributed/cli/dask_scheduler.py", line 206, in main
**kwargs
File "/opt/conda/lib/python3.7/site-packages/distributed/scheduler.py", line 1143, in __init__
self.connection_args = self.security.get_connection_args("scheduler")
File "/opt/conda/lib/python3.7/site-packages/distributed/security.py", line 224, in get_connection_args
"ssl_context": self._get_tls_context(tls, ssl.Purpose.SERVER_AUTH),
File "/opt/conda/lib/python3.7/site-packages/distributed/security.py", line 187, in _get_tls_context
ctx = ssl.create_default_context(purpose=purpose, cafile=ca)
File "/opt/conda/lib/python3.7/ssl.py", line 584, in create_default_context
context.load_verify_locations(cafile, capath, cadata)
FileNotFoundError: [Errno 2] No such file or directory
Helm Config Yaml文件如下:
scheduler:
allowed-failures: 5
env:
- name: DASK_DISTRIBUTED__COMM__DEFAULT_SCHEME
value: "tls"
- name: DASK_DISTRIBUTED__COMM__REQUIRE_ENCRYPTION
value: "true"
- name: DASK_DISTRIBUTED__COMM__TLS__CA_FILE
value: "myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__KEY
value: "mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__CERT
value: "myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__WORKER__KEY
value: "mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__WORKER__CERT
value: "myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__KEY
value: "mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__CERT
value: "myca.pem"
我按如下方式创建密钥和证书文件:
openssl req -newkey rsa:4096 -nodes -sha256 -x509 -days 3650 -nodes -out myca.pem -keyout mykey.pem
这是一个最小的完整可验证示例:
import dask.dataframe as dd
from dask.distributed import Client
from distributed.security import Security
sec = Security(tls_ca_file='myca.pem',
tls_client_cert='myca.pem',
tls_client_key='mykey.pem',
require_encryption=True)
with Client("tls://<scheduler_ip>:8786", security=sec) as dask_client:
ddf = dd.read_csv('gs://<bucket_name>/my_file.csv',
engine='python',
error_bad_lines=False,
encoding="utf-8",
assume_missing=True
)
print(ddf.shape[0].compute())
在我看来,您没有在项目文件夹中创建密钥(并且您的代码似乎表明这是您想要的位置)
请查看您的 sec = Security()
版块,例如:
import dask.dataframe as dd
from dask.distributed import Client
from distributed.security import Security
sec = Security(tls_ca_file='<ADD_FULL_PATH_TO_PEM>/myca.pem',
tls_client_cert='<ADD_FULL_PATH_TO_PEM>/myca.pem',
tls_client_key='<ADD_FULL_PATH_TO_PEM>/mykey.pem',
require_encryption=True)
with Client("tls://<scheduler_ip>:8786", security=sec) as dask_client:
ddf = dd.read_csv('gs://<bucket_name>/my_file.csv',
engine='python',
error_bad_lines=False,
encoding="utf-8",
assume_missing=True
)
print(ddf.shape[0].compute())
下面的 link 可能会帮助您找出 pem 文件所在的位置:
我解决了这个问题。 Dask 工作人员和调度程序都需要在配置中包含证书文件。此外,我们还需要在 dockerfile 中加入证书。请参阅下面的完整配置:
Docker 文件
FROM daskdev/dask
RUN conda install --yes \
-c conda-forge \
python==3.7
ADD certs /certs/
ENTRYPOINT ["tini", "-g", "--", "/usr/bin/prepare.sh"]
头盔配置
worker:
name: worker
image:
repository: "gcr.io/PROJECT_ID/mydask"
tag: "latest"
env:
- name: DASK_DISTRIBUTED__COMM__DEFAULT_SCHEME
value: "tls"
- name: DASK_DISTRIBUTED__COMM__REQUIRE_ENCRYPTION
value: "true"
- name: DASK_DISTRIBUTED__COMM__TLS__CA_FILE
value: "certs/myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__KEY
value: "certs/mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__CERT
value: "certs/myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__WORKER__KEY
value: "certs/mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__WORKER__CERT
value: "certs/myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__KEY
value: "certs/mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__CERT
value: "certs/myca.pem"
scheduler:
name: scheduler
image:
repository: "gcr.io/PROJECT_ID/mydask"
tag: "latest"
env:
- name: DASK_DISTRIBUTED__COMM__DEFAULT_SCHEME
value: "tls"
- name: DASK_DISTRIBUTED__COMM__REQUIRE_ENCRYPTION
value: "true"
- name: DASK_DISTRIBUTED__COMM__TLS__CA_FILE
value: "certs/myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__KEY
value: "certs/mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__CERT
value: "certs/myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__WORKER__KEY
value: "certs/mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__WORKER__CERT
value: "certs/myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__KEY
value: "certs/mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__CERT
value: "certs/myca.pem"
我正在尝试弄清楚如何向使用 GCP 上的 GKE 上的 helm 部署的 Dask 集群添加安全层,这将强制用户将证书和密钥文件输入到安全对象中,如中所述本文档 [1]。不幸的是,我从调度程序 pod 崩溃中收到超时错误。经调查日志,错误如下:
Traceback (most recent call last):
File "/opt/conda/bin/dask-scheduler", line 10, in <module>
sys.exit(go())
File "/opt/conda/lib/python3.7/site-packages/distributed/cli/dask_scheduler.py", line 226, in go
main()
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/distributed/cli/dask_scheduler.py", line 206, in main
**kwargs
File "/opt/conda/lib/python3.7/site-packages/distributed/scheduler.py", line 1143, in __init__
self.connection_args = self.security.get_connection_args("scheduler")
File "/opt/conda/lib/python3.7/site-packages/distributed/security.py", line 224, in get_connection_args
"ssl_context": self._get_tls_context(tls, ssl.Purpose.SERVER_AUTH),
File "/opt/conda/lib/python3.7/site-packages/distributed/security.py", line 187, in _get_tls_context
ctx = ssl.create_default_context(purpose=purpose, cafile=ca)
File "/opt/conda/lib/python3.7/ssl.py", line 584, in create_default_context
context.load_verify_locations(cafile, capath, cadata)
FileNotFoundError: [Errno 2] No such file or directory
Helm Config Yaml文件如下:
scheduler:
allowed-failures: 5
env:
- name: DASK_DISTRIBUTED__COMM__DEFAULT_SCHEME
value: "tls"
- name: DASK_DISTRIBUTED__COMM__REQUIRE_ENCRYPTION
value: "true"
- name: DASK_DISTRIBUTED__COMM__TLS__CA_FILE
value: "myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__KEY
value: "mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__CERT
value: "myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__WORKER__KEY
value: "mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__WORKER__CERT
value: "myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__KEY
value: "mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__CERT
value: "myca.pem"
我按如下方式创建密钥和证书文件:
openssl req -newkey rsa:4096 -nodes -sha256 -x509 -days 3650 -nodes -out myca.pem -keyout mykey.pem
这是一个最小的完整可验证示例:
import dask.dataframe as dd
from dask.distributed import Client
from distributed.security import Security
sec = Security(tls_ca_file='myca.pem',
tls_client_cert='myca.pem',
tls_client_key='mykey.pem',
require_encryption=True)
with Client("tls://<scheduler_ip>:8786", security=sec) as dask_client:
ddf = dd.read_csv('gs://<bucket_name>/my_file.csv',
engine='python',
error_bad_lines=False,
encoding="utf-8",
assume_missing=True
)
print(ddf.shape[0].compute())
在我看来,您没有在项目文件夹中创建密钥(并且您的代码似乎表明这是您想要的位置)
请查看您的 sec = Security()
版块,例如:
import dask.dataframe as dd
from dask.distributed import Client
from distributed.security import Security
sec = Security(tls_ca_file='<ADD_FULL_PATH_TO_PEM>/myca.pem',
tls_client_cert='<ADD_FULL_PATH_TO_PEM>/myca.pem',
tls_client_key='<ADD_FULL_PATH_TO_PEM>/mykey.pem',
require_encryption=True)
with Client("tls://<scheduler_ip>:8786", security=sec) as dask_client:
ddf = dd.read_csv('gs://<bucket_name>/my_file.csv',
engine='python',
error_bad_lines=False,
encoding="utf-8",
assume_missing=True
)
print(ddf.shape[0].compute())
下面的 link 可能会帮助您找出 pem 文件所在的位置:
我解决了这个问题。 Dask 工作人员和调度程序都需要在配置中包含证书文件。此外,我们还需要在 dockerfile 中加入证书。请参阅下面的完整配置:
Docker 文件
FROM daskdev/dask
RUN conda install --yes \
-c conda-forge \
python==3.7
ADD certs /certs/
ENTRYPOINT ["tini", "-g", "--", "/usr/bin/prepare.sh"]
头盔配置
worker:
name: worker
image:
repository: "gcr.io/PROJECT_ID/mydask"
tag: "latest"
env:
- name: DASK_DISTRIBUTED__COMM__DEFAULT_SCHEME
value: "tls"
- name: DASK_DISTRIBUTED__COMM__REQUIRE_ENCRYPTION
value: "true"
- name: DASK_DISTRIBUTED__COMM__TLS__CA_FILE
value: "certs/myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__KEY
value: "certs/mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__CERT
value: "certs/myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__WORKER__KEY
value: "certs/mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__WORKER__CERT
value: "certs/myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__KEY
value: "certs/mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__CERT
value: "certs/myca.pem"
scheduler:
name: scheduler
image:
repository: "gcr.io/PROJECT_ID/mydask"
tag: "latest"
env:
- name: DASK_DISTRIBUTED__COMM__DEFAULT_SCHEME
value: "tls"
- name: DASK_DISTRIBUTED__COMM__REQUIRE_ENCRYPTION
value: "true"
- name: DASK_DISTRIBUTED__COMM__TLS__CA_FILE
value: "certs/myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__KEY
value: "certs/mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__SCHEDULER__CERT
value: "certs/myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__WORKER__KEY
value: "certs/mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__WORKER__CERT
value: "certs/myca.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__KEY
value: "certs/mykey.pem"
- name: DASK_DISTRIBUTED__COMM__TLS__CLIENT__CERT
value: "certs/myca.pem"