从 GCP 连接到 Azure SQL 时出现 NoClassDefFoundError com/microsoft/aad/adal4j/AuthenticationException

NoClassDefFoundError com/microsoft/aad/adal4j/AuthenticationException while connecting to Azure SQL from GCP

我在 GCP data_proc 上有我的 spark 项目,在 spark 提交时,运行 驱动程序。 当我尝试连接到 Azure SQL 数据库时,它抛出以下异常:

20:39:15 DOCKER: Exception in thread "main" java.lang.NoClassDefFoundError: com/microsoft/aad/adal4j/AuthenticationException
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.SQLServerConnection.getFedAuthToken(SQLServerConnection.java:3609)
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.SQLServerConnection.onFedAuthInfo(SQLServerConnection.java:3580)
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.SQLServerConnection.processFedAuthInfo(SQLServerConnection.java:3548)
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onFedAuthInfo(tdsparser.java:261)
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:103)
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.SQLServerConnection.sendLogon(SQLServerConnection.java:4290)
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.SQLServerConnection.logon(SQLServerConnection.java:3157)
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.SQLServerConnection.access0(SQLServerConnection.java:82)
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.SQLServerConnection$LogonCommand.doExecute(SQLServerConnection.java:3121)
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7151)
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2478)
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectHelper(SQLServerConnection.java:2026)
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.SQLServerConnection.login(SQLServerConnection.java:1687)
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectInternal(SQLServerConnection.java:1528)
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.SQLServerConnection.connect(SQLServerConnection.java:866)
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.SQLServerDataSource.getConnectionInternal(SQLServerDataSource.java:968)
20:39:15 DOCKER:    at com.microsoft.sqlserver.jdbc.SQLServerDataSource.getConnection(SQLServerDataSource.java:69)

以下是组件的版本:

身份验证是通过 Active Directory 进行的。 同样的事情在本地有效,但在 dataproc 中无效。 感谢任何帮助!!

您似乎在使用 Docker。如果是这样,您需要确保 adal4j.jar 包含在驱动程序 Docker 容器中,或者它是通过 Spark 提交命令中的 --jars 标志添加的:

gcloud dataproc jobs spark submit \
  --cluster-name $CLUSTER_NAME \
  . . . \
  --jars adal4j.jar

作为参考,请参阅如何在 Spark 中管理 Java 依赖项:https://cloud.google.com/dataproc/docs/guides/manage-spark-dependencies

如果你将你的工作代码打包成一个带有所有依赖项的 fat jar 并且你submitted it appropriately to your Dataproc cluster, and even then you are facing the error, one possible reason of the problem is that a classpath conflict related to the SQL Server driver library exists somewhere. As pointed out as well in my comment, although in a different context, a similar behavior is reported in several Github issues like this or this other

除了尝试删除冲突的库外,我不知道是否适用于您的用例——可能不是数据库驱动程序——但也许您可以尝试将 SQL 服务器代码重新定位到不同的包并改用那个包。

the GCP Dataproc documentation, for instance, using the Maven shade plugin 中描述了该方法。