通过 zeppelin 从 docker-hadoop-spark--workbench 访问 hdfs
Accessing hdfs from docker-hadoop-spark--workbench via zeppelin
我已经安装了https://github.com/big-data-europe/docker-hadoop-spark-workbench
然后用 docker-compose up
启动它。我导航到 the various urls mentioned in the git readme,一切似乎都已启动。
然后我启动了本地 apache zeppelin:
./bin/zeppelin.sh start
在 zeppelin 解释器设置中,我导航到 spark 解释器并更新了 master 以指向安装了 docker
的本地集群
master:从 local[*]
更新到 spark://localhost:8080
然后我运行在笔记本中写了下面的代码:
import org.apache.hadoop.fs.{FileSystem,Path}
FileSystem.get( sc.hadoopConfiguration ).listStatus( new Path("hdfs:///")).foreach( x => println(x.getPath ))
我在 zeppelin 日志中得到这个异常:
INFO [2017-12-15 18:06:35,704] ({pool-2-thread-2} Paragraph.java[jobRun]:362) - run paragraph 20171212-200101_1553252595 using null org.apache.zeppelin.interpreter.LazyOpenInterpreter@32d09a20
WARN [2017-12-15 18:07:37,717] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2064) - Job 20171212-200101_1553252595 is finished, status: ERROR, exception: null, result: %text java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:398)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:387)
at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:843)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:491)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
如何从 zeppelin 和 java/spark 代码访问 hdfs?
出现异常的原因是 sparkSession
对象由于某种原因在 Zeppelin 中是 null
。
private SparkContext createSparkContext_2() {
return (SparkContext) Utils.invokeMethod(sparkSession, "sparkContext");
}
可能是与配置相关的问题。请交叉验证 settings/configuration 和 spark 集群设置。确保 spark 工作正常。
参考:https://zeppelin.apache.org/docs/latest/interpreter/spark.html
希望对您有所帮助。
我已经安装了https://github.com/big-data-europe/docker-hadoop-spark-workbench
然后用 docker-compose up
启动它。我导航到 the various urls mentioned in the git readme,一切似乎都已启动。
然后我启动了本地 apache zeppelin:
./bin/zeppelin.sh start
在 zeppelin 解释器设置中,我导航到 spark 解释器并更新了 master 以指向安装了 docker
master:从 local[*]
更新到 spark://localhost:8080
然后我运行在笔记本中写了下面的代码:
import org.apache.hadoop.fs.{FileSystem,Path}
FileSystem.get( sc.hadoopConfiguration ).listStatus( new Path("hdfs:///")).foreach( x => println(x.getPath ))
我在 zeppelin 日志中得到这个异常:
INFO [2017-12-15 18:06:35,704] ({pool-2-thread-2} Paragraph.java[jobRun]:362) - run paragraph 20171212-200101_1553252595 using null org.apache.zeppelin.interpreter.LazyOpenInterpreter@32d09a20
WARN [2017-12-15 18:07:37,717] ({pool-2-thread-2} NotebookServer.java[afterStatusChange]:2064) - Job 20171212-200101_1553252595 is finished, status: ERROR, exception: null, result: %text java.lang.NullPointerException
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:38)
at org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:33)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext_2(SparkInterpreter.java:398)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:387)
at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:843)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:491)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access1(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
如何从 zeppelin 和 java/spark 代码访问 hdfs?
出现异常的原因是 sparkSession
对象由于某种原因在 Zeppelin 中是 null
。
private SparkContext createSparkContext_2() {
return (SparkContext) Utils.invokeMethod(sparkSession, "sparkContext");
}
可能是与配置相关的问题。请交叉验证 settings/configuration 和 spark 集群设置。确保 spark 工作正常。
参考:https://zeppelin.apache.org/docs/latest/interpreter/spark.html
希望对您有所帮助。