从 GCP 连接到 Azure SQL 时出现 NoClassDefFoundError com/microsoft/aad/adal4j/AuthenticationException
NoClassDefFoundError com/microsoft/aad/adal4j/AuthenticationException while connecting to Azure SQL from GCP
我在 GCP data_proc 上有我的 spark 项目,在 spark 提交时,运行 驱动程序。
当我尝试连接到 Azure SQL 数据库时,它抛出以下异常:
20:39:15 DOCKER: Exception in thread "main" java.lang.NoClassDefFoundError: com/microsoft/aad/adal4j/AuthenticationException
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.getFedAuthToken(SQLServerConnection.java:3609)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.onFedAuthInfo(SQLServerConnection.java:3580)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.processFedAuthInfo(SQLServerConnection.java:3548)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onFedAuthInfo(tdsparser.java:261)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:103)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:4290)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:3157)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.access0(SQLServerConnection.java:82)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:3121)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7151)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2478)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2026)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1687)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1528)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:866)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerDataSource.getConnectionInternal(SQLServerDataSource.java:968)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerDataSource.getConnection(SQLServerDataSource.java:69)
以下是组件的版本:
- 数据处理:1.5
- adal4j:1.6.7
- azure-sqldb-spark:1.0.2
身份验证是通过 Active Directory 进行的。
同样的事情在本地有效,但在 dataproc 中无效。
感谢任何帮助!!
您似乎在使用 Docker。如果是这样,您需要确保 adal4j.jar
包含在驱动程序 Docker 容器中,或者它是通过 Spark 提交命令中的 --jars
标志添加的:
gcloud dataproc jobs spark submit \
--cluster-name $CLUSTER_NAME \
. . . \
--jars adal4j.jar
作为参考,请参阅如何在 Spark 中管理 Java 依赖项:https://cloud.google.com/dataproc/docs/guides/manage-spark-dependencies
如果你将你的工作代码打包成一个带有所有依赖项的 fat jar 并且你submitted it appropriately to your Dataproc cluster, and even then you are facing the error, one possible reason of the problem is that a classpath conflict related to the SQL Server driver library exists somewhere. As pointed out as well in my comment, although in a different context, a similar behavior is reported in several Github issues like this or this other。
除了尝试删除冲突的库外,我不知道是否适用于您的用例——可能不是数据库驱动程序——但也许您可以尝试将 SQL 服务器代码重新定位到不同的包并改用那个包。
the GCP Dataproc documentation, for instance, using the Maven shade plugin 中描述了该方法。
我在 GCP data_proc 上有我的 spark 项目,在 spark 提交时,运行 驱动程序。 当我尝试连接到 Azure SQL 数据库时,它抛出以下异常:
20:39:15 DOCKER: Exception in thread "main" java.lang.NoClassDefFoundError: com/microsoft/aad/adal4j/AuthenticationException
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.getFedAuthToken(SQLServerConnection.java:3609)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.onFedAuthInfo(SQLServerConnection.java:3580)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.processFedAuthInfo(SQLServerConnection.java:3548)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onFedAuthInfo(tdsparser.java:261)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:103)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:4290)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:3157)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.access0(SQLServerConnection.java:82)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:3121)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7151)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2478)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2026)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1687)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1528)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:866)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerDataSource.getConnectionInternal(SQLServerDataSource.java:968)
20:39:15 DOCKER: at com.microsoft.sqlserver.jdbc.SQLServerDataSource.getConnection(SQLServerDataSource.java:69)
以下是组件的版本:
- 数据处理:1.5
- adal4j:1.6.7
- azure-sqldb-spark:1.0.2
身份验证是通过 Active Directory 进行的。 同样的事情在本地有效,但在 dataproc 中无效。 感谢任何帮助!!
您似乎在使用 Docker。如果是这样,您需要确保 adal4j.jar
包含在驱动程序 Docker 容器中,或者它是通过 Spark 提交命令中的 --jars
标志添加的:
gcloud dataproc jobs spark submit \
--cluster-name $CLUSTER_NAME \
. . . \
--jars adal4j.jar
作为参考,请参阅如何在 Spark 中管理 Java 依赖项:https://cloud.google.com/dataproc/docs/guides/manage-spark-dependencies
如果你将你的工作代码打包成一个带有所有依赖项的 fat jar 并且你submitted it appropriately to your Dataproc cluster, and even then you are facing the error, one possible reason of the problem is that a classpath conflict related to the SQL Server driver library exists somewhere. As pointed out as well in my comment, although in a different context, a similar behavior is reported in several Github issues like this or this other。
除了尝试删除冲突的库外,我不知道是否适用于您的用例——可能不是数据库驱动程序——但也许您可以尝试将 SQL 服务器代码重新定位到不同的包并改用那个包。
the GCP Dataproc documentation, for instance, using the Maven shade plugin 中描述了该方法。