Dataproc Sqoop job with Postgres throwing error: Trust anchor for certification path not found

Dataproc Sqoop job with Postgres throwing error: Trust anchor for certification path not found

尝试将 sqoop 作业提交到 dataproc 以从文章后的 postgres 数据库导出数据:https://medium.com/google-cloud/migrate-oracle-data-to-bigquery-using-dataproc-and-sqoop-cd3863adde7b

出现错误:org.postgresql.util.PSQLException:SSL 错误:java.security.cert.CertPathValidatorException:找不到证书路径的信任锚。

这是我要提交的命令(变量已适当设置):

gcloud dataproc jobs submit hadoop --cluster=sqoop-cluster --region=us-central1 --class=org.apache.sqoop.Sqoop --jars=$libs -- import -Dmapreduce.job.user.classpath.first=true -Dorg.apache.sqoop.splitter.all
ow_text_splitter=true --connect=$JDBC_STR --username=xxx --password=xxxx--driver=org.postgresql.Driver --target-dir=$STAGING_BUCKET/$TABLE --table=$SCHEMA.$TABLE --enclosed-by
 '\"' --escaped-by \" --fields-terminated-by '|' --null-string '' --null-non-string '' --as-textfile

postgres jdbc连接字符串如下(省略ssl=true会抛出hba_conf未找到):

JDBC_STR=jdbc:postgresql://xxxxx:5432/YYYY?ssl=true        

详细错误:

Job [63fb49544a1141f89f9a12960cc18e18] submitted.
Waiting for job output...
/usr/lib/hadoop/libexec//hadoop-functions.sh: line 2400: HADOOP_COM.GOOGLE.CLOUD.HADOOP.SERVICES.AGENT.JOB.SHIM.HADOOPRUNCLASSSHIM_USER: invalid variable name
/usr/lib/hadoop/libexec//hadoop-functions.sh: line 2365: HADOOP_COM.GOOGLE.CLOUD.HADOOP.SERVICES.AGENT.JOB.SHIM.HADOOPRUNCLASSSHIM_USER: invalid variable name
/usr/lib/hadoop/libexec//hadoop-functions.sh: line 2460: HADOOP_COM.GOOGLE.CLOUD.HADOOP.SERVICES.AGENT.JOB.SHIM.HADOOPRUNCLASSSHIM_OPTS: invalid variable name
2021-10-14 21:48:33,931 WARN tool.SqoopTool: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2021-10-14 21:48:34,128 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
2021-10-14 21:48:34,156 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2021-10-14 21:48:34,176 WARN sqoop.ConnFactory: $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2021-10-14 21:48:34,203 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.Gene
ricJdbcManager. Please specify explicitly which connection manager should be used next time.
2021-10-14 21:48:34,217 INFO manager.SqlManager: Using default fetchSize of 1000
2021-10-14 21:48:34,217 INFO tool.CodeGenTool: Beginning code generation
2021-10-14 21:48:34,504 ERROR manager.SqlManager: Error executing statement: org.postgresql.util.PSQLException: SSL error: java.security.cert.CertPathValidatorException: Trust anchor for certification path not found.
org.postgresql.util.PSQLException: SSL error: java.security.cert.CertPathValidatorException: Trust anchor for certification path not found.
        at org.postgresql.ssl.MakeSSL.convert(MakeSSL.java:64)

感谢任何帮助。

谢谢!

您的 PostgreSQL 服务器似乎启用了 SSL,但客户端(Dataproc 虚拟机)未配置服务器证书或其根 CA。

  1. 使用 ssl=true 客户端将验证服务器证书,您可以使用 Dataproc init action 将服务器证书导入 Dataproc 虚拟机:
gsutil cp gs://<my-bucket>/server.crt .

# If `JAVA_HOME` is not defined, try `/usr/lib/jvm/adoptopenjdk-8-hotspot-amd64`.
keytool -keystore $JAVA_HOME/lib/security/cacerts -alias postgresql -import -file server.crt
  1. 如果您不想在客户端验证服务器证书,而是希望服务器验证客户端 hostname/IP 和证书,请配置您的服务器,然后使用 sslmode=require 在连接字符串中。

  2. 要在客户端禁用服务器认证验证的情况下进行快速测试,请在 JDBC 连接字符串中尝试:

?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory

请参阅此 doc for more information on configuring SSL for PostgreSQL. Also a similar question 以供参考。