AWS Glue 无法 运行 雪花 python 连接,如示例 https://www.snowflake.com/blog/how-to-use-aws-glue-with-snowflake/
AWS Glue fails to run the snowflake python connection as in example https://www.snowflake.com/blog/how-to-use-aws-glue-with-snowflake/
(提交以下主题以帮助其他 Snowflake 用户了解 AWS Glue 的工作原理)
我正尝试在我的 aws 胶水作业中实现雪花连接,如示例中所述:
https://www.snowflake.com/blog/how-to-use-aws-glue-with-snowflake/
我使用的是最新版本
- spark-snowflake_2.12-2.5.2-spark_2.4
- 雪花-jdbc-3.9.1
- Glue 版本 - Spark 2.4,Python 3(Glue 版本 1.0)
但出现以下错误:
py4j.protocol.Py4JJavaError: An error occurred while calling o75.load.
: java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V
at net.snowflake.spark.snowflake.Parameters$MergedParameters.<init>(Parameters.scala:208)
at net.snowflake.spark.snowflake.Parameters$.mergeParameters(Parameters.scala:202)
at net.snowflake.spark.snowflake.DefaultSource.createRelation(DefaultSource.scala:59)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
答案 #1:
也许我有点笼统,但我倾向于认为这样的错误是由于版本不兼容造成的。你的 SCALA 版本是多少?以下是一些可能有助于设置的一般信息(可能需要单击 'Expand Post' 才能查看全部):
我相信 AWS Glue 现在支持 Spark 2.4.3,但它仍然很新,可能还没有经过全面测试。您可以免费试用,或者试用我们也知道 AWS Glue 支持的 Spark 2.2.1。
我们知道 AWS Glue 也支持 Spark 2.2.1,所以我将在这里详细说明。第一个 link 是关于 AWS Glue 对此版本的支持的文档,第二个 link 是您可以下载此版本的 spark-snowflake 连接器的地方。
- https://aws.amazon.com/about-aws/whats-new/2018/04/aws-glue-now-supports-apache-spark-221/
- https://repo1.maven.org/maven2/net/snowflake/spark-snowflake_2.11/2.2.1/spark-snowflake_2.11-2.2.1.jar
对于 Spark 2.2.1,我相信您需要 SCALA 2.11,您可以在这里找到它:
您可以使用这里下载的最新雪花JDBC驱动:
希望这对您有所帮助。如果您仍然遇到同样的错误,请告诉我们。
响应 #2:
在您的建议的帮助下,我能够连接 table 并将其从雪花数据库加载到数据框。
我正在使用版本,
1) Spark 2.2,Python2,粘合版本 (0.9)
2) spark-snowflake_2.11-2.2.1.jar
3) 雪花-jdbc-3.2.7
但是当我执行 df.printschema() 时一切都很好但是当我执行 df.show() 时它给出以下错误:
我也无法将此数据帧保存到 s3 存储桶。
py4j.protocol.Py4JJavaError: An error occurred while calling o75.showString.
: java.lang.NoClassDefFoundError: net/snowflake/client/jdbc/internal/snowflake/common/core/S3FileEncryptionMaterial
at net.snowflake.spark.snowflake.ConnectorSFStageManager.encMat$lzycompute(ConnectorSFStageManager.scala:203)
at net.snowflake.spark.snowflake.ConnectorSFStageManager.encMat(ConnectorSFStageManager.scala:201)
at net.snowflake.spark.snowflake.ConnectorSFStageManager.masterKey$lzycompute(ConnectorSFStageManager.scala:231)
at net.snowflake.spark.snowflake.ConnectorSFStageManager.masterKey(ConnectorSFStageManager.scala:230)
at net.snowflake.spark.snowflake.SnowflakeRDD.<init>(SnowflakeRDD.scala:60)
at net.snowflake.spark.snowflake.SnowflakeRelation.getRDDFromS3(SnowflakeRelation.scala:189)
at net.snowflake.spark.snowflake.SnowflakeRelation.buildScanFromSQL(SnowflakeRelation.scala:103)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder.toRDD(QueryBuilder.scala:81)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder.rdd$lzycompute(QueryBuilder.scala:28)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder.rdd(QueryBuilder.scala:28)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder$$anonfun$getRDDFromPlan.apply(QueryBuilder.scala:183)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder$$anonfun$getRDDFromPlan.apply(QueryBuilder.scala:182)
at scala.Option.map(Option.scala:146)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder$.getRDDFromPlan(QueryBuilder.scala:182)
at net.snowflake.spark.snowflake.pushdowns.SnowflakeStrategy.buildQueryRDD(SnowflakeStrategy.scala:35)
at net.snowflake.spark.snowflake.pushdowns.SnowflakeStrategy.apply(SnowflakeStrategy.scala:20)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun.apply(QueryPlanner.scala:62)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun.apply(QueryPlanner.scala:62)
at scala.collection.Iterator$$anon.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon.hasNext(Iterator.scala:439)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2837)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2150)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2363)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:241)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: net.snowflake.client.jdbc.internal.snowflake.common.core.S3FileEncryptionMaterial
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
回答#2
你能用更高版本的 JDBC 驱动程序再试一次吗?
回复#3
感谢您调查。我能够通过以下版本组合解决问题:
Spark 2.4, Python 3 (Glue version 1.0)
spark-snowflake_2.11-2.4.8.jar
snowflake-jdbc-3.6.12.jar
回答 #3
也感谢您发布版本,以便其他用户知道什么适用于 AWS Glue。
(提交以下主题以帮助其他 Snowflake 用户了解 AWS Glue 的工作原理)
我正尝试在我的 aws 胶水作业中实现雪花连接,如示例中所述:
https://www.snowflake.com/blog/how-to-use-aws-glue-with-snowflake/
我使用的是最新版本
- spark-snowflake_2.12-2.5.2-spark_2.4
- 雪花-jdbc-3.9.1
- Glue 版本 - Spark 2.4,Python 3(Glue 版本 1.0)
但出现以下错误:
py4j.protocol.Py4JJavaError: An error occurred while calling o75.load.
: java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V
at net.snowflake.spark.snowflake.Parameters$MergedParameters.<init>(Parameters.scala:208)
at net.snowflake.spark.snowflake.Parameters$.mergeParameters(Parameters.scala:202)
at net.snowflake.spark.snowflake.DefaultSource.createRelation(DefaultSource.scala:59)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
答案 #1: 也许我有点笼统,但我倾向于认为这样的错误是由于版本不兼容造成的。你的 SCALA 版本是多少?以下是一些可能有助于设置的一般信息(可能需要单击 'Expand Post' 才能查看全部):
我相信 AWS Glue 现在支持 Spark 2.4.3,但它仍然很新,可能还没有经过全面测试。您可以免费试用,或者试用我们也知道 AWS Glue 支持的 Spark 2.2.1。
我们知道 AWS Glue 也支持 Spark 2.2.1,所以我将在这里详细说明。第一个 link 是关于 AWS Glue 对此版本的支持的文档,第二个 link 是您可以下载此版本的 spark-snowflake 连接器的地方。
- https://aws.amazon.com/about-aws/whats-new/2018/04/aws-glue-now-supports-apache-spark-221/
- https://repo1.maven.org/maven2/net/snowflake/spark-snowflake_2.11/2.2.1/spark-snowflake_2.11-2.2.1.jar
对于 Spark 2.2.1,我相信您需要 SCALA 2.11,您可以在这里找到它:
您可以使用这里下载的最新雪花JDBC驱动:
希望这对您有所帮助。如果您仍然遇到同样的错误,请告诉我们。
响应 #2: 在您的建议的帮助下,我能够连接 table 并将其从雪花数据库加载到数据框。
我正在使用版本,
1) Spark 2.2,Python2,粘合版本 (0.9)
2) spark-snowflake_2.11-2.2.1.jar
3) 雪花-jdbc-3.2.7
但是当我执行 df.printschema() 时一切都很好但是当我执行 df.show() 时它给出以下错误:
我也无法将此数据帧保存到 s3 存储桶。
py4j.protocol.Py4JJavaError: An error occurred while calling o75.showString.
: java.lang.NoClassDefFoundError: net/snowflake/client/jdbc/internal/snowflake/common/core/S3FileEncryptionMaterial
at net.snowflake.spark.snowflake.ConnectorSFStageManager.encMat$lzycompute(ConnectorSFStageManager.scala:203)
at net.snowflake.spark.snowflake.ConnectorSFStageManager.encMat(ConnectorSFStageManager.scala:201)
at net.snowflake.spark.snowflake.ConnectorSFStageManager.masterKey$lzycompute(ConnectorSFStageManager.scala:231)
at net.snowflake.spark.snowflake.ConnectorSFStageManager.masterKey(ConnectorSFStageManager.scala:230)
at net.snowflake.spark.snowflake.SnowflakeRDD.<init>(SnowflakeRDD.scala:60)
at net.snowflake.spark.snowflake.SnowflakeRelation.getRDDFromS3(SnowflakeRelation.scala:189)
at net.snowflake.spark.snowflake.SnowflakeRelation.buildScanFromSQL(SnowflakeRelation.scala:103)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder.toRDD(QueryBuilder.scala:81)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder.rdd$lzycompute(QueryBuilder.scala:28)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder.rdd(QueryBuilder.scala:28)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder$$anonfun$getRDDFromPlan.apply(QueryBuilder.scala:183)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder$$anonfun$getRDDFromPlan.apply(QueryBuilder.scala:182)
at scala.Option.map(Option.scala:146)
at net.snowflake.spark.snowflake.pushdowns.querygeneration.QueryBuilder$.getRDDFromPlan(QueryBuilder.scala:182)
at net.snowflake.spark.snowflake.pushdowns.SnowflakeStrategy.buildQueryRDD(SnowflakeStrategy.scala:35)
at net.snowflake.spark.snowflake.pushdowns.SnowflakeStrategy.apply(SnowflakeStrategy.scala:20)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun.apply(QueryPlanner.scala:62)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun.apply(QueryPlanner.scala:62)
at scala.collection.Iterator$$anon.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon.hasNext(Iterator.scala:439)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2837)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2150)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2363)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:241)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: net.snowflake.client.jdbc.internal.snowflake.common.core.S3FileEncryptionMaterial
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
回答#2 你能用更高版本的 JDBC 驱动程序再试一次吗?
回复#3 感谢您调查。我能够通过以下版本组合解决问题:
Spark 2.4, Python 3 (Glue version 1.0)
spark-snowflake_2.11-2.4.8.jar
snowflake-jdbc-3.6.12.jar
回答 #3 也感谢您发布版本,以便其他用户知道什么适用于 AWS Glue。