GCP Dataproc 作业找不到存储在存储桶中的 SSL pem 证书
GCP Dataproc job not finding SSL pem certs stored in buckets
我有一个 GCP Dataproc 集群,我正在尝试部署一个 pyspark 作业,它使用 SSL 生成一个主题。
pem 文件存储在存储桶 gs://dataproc_kafka_code/code 中,我正在使用下面显示的代码访问 pem 文件。
但是代码找不到pem文件,报错如下:
%3|1638738651.097|SSL|rdkafka#producer-1| [thrd:app]: error:02001002:system library:fopen:No such file or directory: fopen('gs://dataproc_kafka_code/code/caroot.pem','r')
%3|1638738651.097|SSL|rdkafka#producer-1| [thrd:app]: error:2006D080:BIO routines:BIO_new_file:no such file
Traceback (most recent call last):
File "/tmp/my-job6/KafkaProducer.py", line 21, in <module>
producer = Producer(conf)
cimpl.KafkaException: KafkaError{code=_INVALID_ARG,val=-186,str="Failed to create producer: ssl.ca.location failed: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib"}
代码:
from confluent_kafka import Producer
kafkaBrokers='<host>:<port>'
# CA Root certificate ca.crt
caRootLocation='gs://dataproc_kafka_code/code/caroot.pem'
# user public (user.crt)
certLocation='gs://dataproc_kafka_code/code/my-bridge-user-crt.pem'
# user.key
keyLocation='gs://dataproc_kafka_code/code/user-with-certs.pem'
password='<password>'
conf = {'bootstrap.servers': kafkaBrokers,
'security.protocol': 'SSL',
'ssl.ca.location':caRootLocation,
'ssl.certificate.location': certLocation,
'ssl.key.location':keyLocation,
'ssl.key.password' : password
}
topic = 'my-topic'
producer = Producer(conf)
for n in range(100):
producer.produce(topic, key=str(n), value=" val -> "+str(n*(-1)) + " on dec 5 from dataproc ")
producer.flush()
需要做什么来解决这个问题?
此外,这是提供对 SSL 证书的代码访问的正确方法吗?
蒂亚!
来自错误
fopen:No such file or directory: fopen('gs://dataproc_kafka_code/code/caroot.pem','r')
,似乎 Producer
库正在尝试从本地文件系统下载文件。
您可以通过多种方法尝试解决此问题,方法是将这些 keys/certificates 下载到本地文件,然后将 conf 指向它们:
- 使用存储客户端下载APIhttps://googleapis.dev/python/storage/latest/client.html
- 或使用 gsutil(预装在 VM 中)下载文件 https://cloud.google.com/storage/docs/gsutil/commands/cp
我有一个 GCP Dataproc 集群,我正在尝试部署一个 pyspark 作业,它使用 SSL 生成一个主题。
pem 文件存储在存储桶 gs://dataproc_kafka_code/code 中,我正在使用下面显示的代码访问 pem 文件。 但是代码找不到pem文件,报错如下:
%3|1638738651.097|SSL|rdkafka#producer-1| [thrd:app]: error:02001002:system library:fopen:No such file or directory: fopen('gs://dataproc_kafka_code/code/caroot.pem','r')
%3|1638738651.097|SSL|rdkafka#producer-1| [thrd:app]: error:2006D080:BIO routines:BIO_new_file:no such file
Traceback (most recent call last):
File "/tmp/my-job6/KafkaProducer.py", line 21, in <module>
producer = Producer(conf)
cimpl.KafkaException: KafkaError{code=_INVALID_ARG,val=-186,str="Failed to create producer: ssl.ca.location failed: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib"}
代码:
from confluent_kafka import Producer
kafkaBrokers='<host>:<port>'
# CA Root certificate ca.crt
caRootLocation='gs://dataproc_kafka_code/code/caroot.pem'
# user public (user.crt)
certLocation='gs://dataproc_kafka_code/code/my-bridge-user-crt.pem'
# user.key
keyLocation='gs://dataproc_kafka_code/code/user-with-certs.pem'
password='<password>'
conf = {'bootstrap.servers': kafkaBrokers,
'security.protocol': 'SSL',
'ssl.ca.location':caRootLocation,
'ssl.certificate.location': certLocation,
'ssl.key.location':keyLocation,
'ssl.key.password' : password
}
topic = 'my-topic'
producer = Producer(conf)
for n in range(100):
producer.produce(topic, key=str(n), value=" val -> "+str(n*(-1)) + " on dec 5 from dataproc ")
producer.flush()
需要做什么来解决这个问题? 此外,这是提供对 SSL 证书的代码访问的正确方法吗?
蒂亚!
来自错误
fopen:No such file or directory: fopen('gs://dataproc_kafka_code/code/caroot.pem','r')
,似乎 Producer
库正在尝试从本地文件系统下载文件。
您可以通过多种方法尝试解决此问题,方法是将这些 keys/certificates 下载到本地文件,然后将 conf 指向它们:
- 使用存储客户端下载APIhttps://googleapis.dev/python/storage/latest/client.html
- 或使用 gsutil(预装在 VM 中)下载文件 https://cloud.google.com/storage/docs/gsutil/commands/cp