Zeppelin 不加载 Maven jar
Zeppelin does not load maven jar
Apache Zeppelin 版本 0.7.1
%dep
z.reset() // clean up previously added artifact and repository
// add maven repository
z.addRepo("Spark Cassandra Connector 2.0.10").url("https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector")
// add artifact recursively
// z.load("groupId:artifactId:version")
z.load("com.datastax.spark:spark-cassandra-connector_2.11:2.0.10")
java.lang.NullPointerException
at org.sonatype.aether.impl.internal.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:352)
at org.apache.zeppelin.spark.dep.SparkDependencyContext.fetchArtifactWithDep(SparkDependencyContext.java:171)
at org.apache.zeppelin.spark.dep.SparkDependencyContext.fetch(SparkDependencyContext.java:121)
at org.apache.zeppelin.spark.DepInterpreter.interpret(DepInterpreter.java:245)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:95)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:490)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
以下内容似乎没有帮助:
- apache zeppelin additional repository import
- 在https://zeppelin.apache.org/docs/0.7.1/interpreter/spark.html中给出了一个将
zeppelin.dep.additionalRemoteRepository
设置为http://dl.bintray.com/spark-packages/maven
的例子,但是这个repo没有我需要的jar版本
- https://zeppelin.apache.org/docs/0.7.1/interpreter/spark.html#3-dynamic-dependency-loading-via-sparkdep-interpreter
- Zeppelin dynamic dependency loading fails on os-maven-plugin
现在我得到了
%dep
z.load("com.datastax.spark:spark-cassandra-connector_2.11:2.0.10")
org.sonatype.aether.resolution.DependencyResolutionException: Could not find artifact com.datastax.spark:spark-cassandra-connector_2.11:jar:2.0.10 in central (https://repo1.maven.org/maven2/)
我需要在 zeppelin-env.sh
中将 Maven 重新定位 url 从 http
更新为 https
export ZEPPELIN_INTERPRETER_DEP_MVNREPO="https://repo1.maven.org/maven2/"
还有其他特定于我们的构建系统(Jenkins 管道)的非通用更改
如果您可以控制服务器,那么只需下载您需要的 jar(即 "groupArtifactVersion": "com.datastax.spark:spark-cassandra-connector_2.11:2.0.10",
),然后从磁盘加载依赖项会容易得多,例如:
%dep
z.load("/opt/zeppelin/spark-cassandra-connector-assembly/spark-cassandra-connector-assembly.jar")
Apache Zeppelin 版本 0.7.1
%dep
z.reset() // clean up previously added artifact and repository
// add maven repository
z.addRepo("Spark Cassandra Connector 2.0.10").url("https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector")
// add artifact recursively
// z.load("groupId:artifactId:version")
z.load("com.datastax.spark:spark-cassandra-connector_2.11:2.0.10")
java.lang.NullPointerException
at org.sonatype.aether.impl.internal.DefaultRepositorySystem.resolveDependencies(DefaultRepositorySystem.java:352)
at org.apache.zeppelin.spark.dep.SparkDependencyContext.fetchArtifactWithDep(SparkDependencyContext.java:171)
at org.apache.zeppelin.spark.dep.SparkDependencyContext.fetch(SparkDependencyContext.java:121)
at org.apache.zeppelin.spark.DepInterpreter.interpret(DepInterpreter.java:245)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:95)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:490)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
以下内容似乎没有帮助:
- apache zeppelin additional repository import
- 在https://zeppelin.apache.org/docs/0.7.1/interpreter/spark.html中给出了一个将
zeppelin.dep.additionalRemoteRepository
设置为http://dl.bintray.com/spark-packages/maven
的例子,但是这个repo没有我需要的jar版本 - https://zeppelin.apache.org/docs/0.7.1/interpreter/spark.html#3-dynamic-dependency-loading-via-sparkdep-interpreter
- Zeppelin dynamic dependency loading fails on os-maven-plugin
现在我得到了
%dep
z.load("com.datastax.spark:spark-cassandra-connector_2.11:2.0.10")
org.sonatype.aether.resolution.DependencyResolutionException: Could not find artifact com.datastax.spark:spark-cassandra-connector_2.11:jar:2.0.10 in central (https://repo1.maven.org/maven2/)
我需要在 zeppelin-env.sh
http
更新为 https
export ZEPPELIN_INTERPRETER_DEP_MVNREPO="https://repo1.maven.org/maven2/"
还有其他特定于我们的构建系统(Jenkins 管道)的非通用更改
如果您可以控制服务器,那么只需下载您需要的 jar(即 "groupArtifactVersion": "com.datastax.spark:spark-cassandra-connector_2.11:2.0.10",
),然后从磁盘加载依赖项会容易得多,例如:
%dep
z.load("/opt/zeppelin/spark-cassandra-connector-assembly/spark-cassandra-connector-assembly.jar")