SAP Vora 1.2 - 从 HANA 读取 Vora 表
SAP Vora 1.2 - Reading Vora tables from HANA
!!!更新!!!
在查看文档数小时后,我终于找到了问题所在。原来是我在Yarn配置中少了一些参数
这是我所做的:
- 在编辑器中打开 yarn-site.xml 文件或登录 Ambari web UI 和 select Yarn>Config。
找到 属性 "yarn.nodemanager.aux-services" 并将 "spark_shuffle" 添加到其当前
价值。新的 属性 名称应该是 "mapreduce_shuffle,spark_shuffle".
- 添加或编辑 属性 "yarn.nodemanager.aux-services.spark_shuffle.class",并将其设置为
"org.apache.spark.network.yarn.YarnShuffleService".
- 复制spark--yarn-shuffle.jar文件(在步骤Install Spark Assembly中下载
文件和依赖库)从 Spark 到 Hadoop-Yarn class 所有节点管理器中的路径
主机。通常此文件夹位于 /usr/hdp//hadoop-yarn/lib.
- 重新启动 Yarn 和节点管理器
!!!!!!!!!!!!!
我正在使用 SAP Vora 1.2 Developer Edition 和最新的 Spark Controller (HANASPARKCTRL00P_5-70001262.RPM)。我在 spark-shell 中将 table 加载到 Vora 中。我可以在 "spark_velocity" 文件夹中的 SAP HANA Studio 中看到 table。我可以将 table 加载为虚拟 Table。问题是我无法 select,或预览 table 中的数据,因为错误:
Error: SAP DBTech JDBC: [403]: internal error: Error opening the
cursor for the remote database for query "SELECT
"SPARK_testtable"."a1", "SPARK_testtable"."a2", "SPARK_testtable"."a3"
FROM "spark_velocity"."testtable" "SPARK_testtable" LIMIT 200 "
这是我的hanaes-site.xml文件:
<configuration>
<!-- You can either copy the assembly jar into HDFS or to lib/external directory.
Please maintain appropriate value here-->
<property>
<name>sap.hana.es.spark.yarn.jar</name>
<value>file:///usr/sap/spark/controller/lib/external/spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar</value>
<final>true</final>
</property>
<property>
<name>sap.hana.es.server.port</name>
<value>7860</value>
<final>true</final>
</property>
<!-- Required if you are copying your files into HDFS-->
<property>
<name>sap.hana.es.lib.location</name>
<value>hdfs:///sap/hana/spark/libs/thirdparty/</value>
<final>true</final>
</property>
-->
<!--Required property if using controller for DLM scenarios-->
<!--
<property>
<name>sap.hana.es.warehouse.dir</name>
<value>/sap/hana/hanaes/warehouse</value>
<final>true</final>
</property>
-->
<property>
<name>sap.hana.es.driver.host</name>
<value>ip-10-0-0-[censored].ec2.internal</value>
<final>true</final>
</property>
<!-- Change this value to vora when connecting to Vora store -->
<property>
<name>sap.hana.hadoop.datastore</name>
<value>vora</value>
<final>true</final>
</property>
<!-- // When running against a kerberos protected cluster, please maintain appropriate values
<property>
<name>spark.yarn.keytab</name>
<value>/usr/sap/spark/controller/conf/hanaes.keytab</value>
<final>true</final>
</property>
<property>
<name>spark.yarn.principal</name>
<value>hanaes@PAL.SAP.CORP</value>
<final>true</final>
</property>
-->
<!-- To enable Secure Socket communication, please maintain appropriate values in the follwing section-->
<property>
<name>sap.hana.es.ssl.keystore</name>
<value></value>
<final>false</final>
</property>
<property>
<name>sap.hana.es.ssl.clientauth.required</name>
<value>true</value>
<final>true</final>
</property>
<property>
<name>sap.hana.es.ssl.verify.hostname</name>
<value>true</value>
<final>true</final>
</property>
<property>
<name>sap.hana.es.ssl.keystore.password</name>
<value></value>
<final>true</final>
</property>
<property>
<name>sap.hana.es.ssl.truststore</name>
<value></value>
<final>true</final>
</property>
<property>
<name>sap.hana.es.ssl.truststore.password</name>
<value></value>
<final>true</final>
</property>
<property>
<name>sap.hana.es.ssl.enabled</name>
<value>false</value>
<final>true</final>
</property>
<property>
<name>spark.executor.instances</name>
<value>10</value>
<final>true</final>
</property>
<property>
<name>spark.executor.memory</name>
<value>5g</value>
<final>true</final>
</property>
<!-- Enable the following section if you want to enable dynamic allocation-->
<!--
<property>
<name>spark.dynamicAllocation.enabled</name>
<value>true</value>
<final>true</final>
</property>
<property>
<name>spark.dynamicAllocation.minExecutors</name>
<value>10</value>
<final>true</final>
</property>
<property>
<name>spark.dynamicAllocation.maxExecutors</name>
<value>20</value>
<final>true</final>
</property>
<property>
<name>spark.shuffle.service.enabled</name>
<value>true</value>
<final>true</final>
</property>
<property>
<name>sap.hana.ar.provider</name>
<value>com.sap.hana.aws.extensions.AWSResolver</value>
<final>true</final>
</property>
<property>
<name>spark.vora.hosts</name>
<value>ip-10-0-0-[censored].ec2.internal:2022,ip-10-0-0-[censored].ec2.internal:2022,ip-10-0-0-[censored].ec2.internal:2022</value>
<final>true</final>
</property>
<property>
<name>spark.vora.zkurls</name>
<value>ip-10-0-0-[censored].ec2.internal:2181,ip-10-0-0-[censored].ec2.internal:2181,ip-10-0-0-[censored].ec2.internal:2181</value>
<final>true</final>
</property>
</configuration>
ls /usr/sap/spark/controller/lib/external/
spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar
hdfs dfs -ls /sap/hana/spark/libs/thirdparty
Found 4 items
-rwxrwxrwx 3 hdfs hdfs 366565 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/datanucleus-api-jdo-4.2.1.jar
-rwxrwxrwx 3 hdfs hdfs 2006182 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/datanucleus-core-4.1.2.jar
-rwxrwxrwx 3 hdfs hdfs 1863315 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/datanucleus-rdbms-4.1.2.jar
-rwxrwxrwx 3 hdfs hdfs 627814 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/joda-time-2.9.3.jar
ls /usr/hdp/
2.3.4.0-3485 2.3.4.7-4 current
vi /var/log/hanaes/hana_controller.log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/sap/spark/controller/lib/spark-sap-datasources-1.2.33-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/sap/spark/controller/lib/external/spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.4.0-3485/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/05/12 07:02:38 INFO HanaESConfig: Loaded HANA Extended Store Configuration
Found Spark Libraries. Proceeding with Current Class Path
16/05/12 07:02:39 INFO Server: Starting Spark Controller
16/05/12 07:03:11 INFO CommandRouter: Connecting to Vora Engine
16/05/12 07:03:11 INFO CommandRouter: Initialized Router
16/05/12 07:03:11 INFO CommandRouter: Server started
16/05/12 07:03:43 INFO CommandHandler: Getting BROWSE data/user/17401406272892502037-4985062628452729323_f17e36cf-0003-0015-452e-800c700001ee
16/05/12 07:03:48 INFO CommandHandler: Getting BROWSE data/user/17401406272892502037-4985062628452729329_f17e36cf-0003-0015-452e-800c700001f4
16/05/12 07:03:48 INFO VoraClientFactory: returning a Vora catalog client of this Vora catalog server: master.i-14371789.cluster:2204
16/05/12 07:03:48 INFO CBinder: searching for compat-sap-c++.so at /opt/rh/SAP/lib64/compat-sap-c++.so
16/05/12 07:03:48 WARN CBinder: could not find compat-sap-c++.so
16/05/12 07:03:48 INFO CBinder: searching for libpam.so.0 at /lib64/libpam.so.0
16/05/12 07:03:48 INFO CBinder: loading libpam.so.0 from /lib64/libpam.so.0
16/05/12 07:03:48 INFO CBinder: loading library libprotobuf.so
16/05/12 07:03:48 INFO CBinder: loading library libprotoc.so
16/05/12 07:03:48 INFO CBinder: loading library libtbbmalloc.so
16/05/12 07:03:48 INFO CBinder: loading library libtbb.so
16/05/12 07:03:48 INFO CBinder: loading library libv2runtime.so
16/05/12 07:03:48 INFO CBinder: loading library libv2net.so
16/05/12 07:03:48 INFO CBinder: loading library libv2catalog_connector.so
16/05/12 07:03:48 INFO CatalogFactory: returning a Vora catalog client of this Vora catalog server: master.i-14371789.cluster:2204
16/05/12 07:11:56 INFO CommandHandler: Getting BROWSE data/user/17401406272892502037-4985062628452729335_f17e36cf-0003-0015-452e-800c700001fa
16/05/12 07:11:56 INFO Utils: freeing the buffer
16/05/12 07:11:56 INFO Utils: freeing the buffer
16/05/12 07:12:02 INFO Utils: freeing the buffer
16/05/12 07:12:02 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/12 07:12:02 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/12 07:12:02 INFO CatalogFactory: returning a Vora catalog client of this Vora catalog server: master.i-14371789.cluster:2204
16/05/12 07:12:02 INFO Utils: freeing the buffer
16/05/12 07:12:02 INFO DefaultSource: Creating VoraRelation testtable using an existing catalog table
16/05/12 07:12:02 INFO Utils: freeing the buffer
16/05/12 07:12:11 INFO Utils: freeing the buffer
16/05/12 07:14:15 ERROR RequestOrchestrator: Result set was not fetched by connected Client. Hence cancelled the execution
16/05/12 07:14:15 ERROR RequestOrchestrator: org.apache.spark.SparkException: Job 0 cancelled part of cancelled job group f17e36cf-0003-0015-452e-800c70000216
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1229)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled.apply$mcVI$sp(DAGScheduler.scala:681)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled.apply(DAGScheduler.scala:681)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled.apply(DAGScheduler.scala:681)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at org.apache.spark.scheduler.DAGScheduler.handleJobGroupCancelled(DAGScheduler.scala:681)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1475)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
at org.apache.spark.util.EventLoop$$anon.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition.apply(RDD.scala:902)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition.apply(RDD.scala:900)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:900)
at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$$anonfun$applyOrElse.apply(CommandRouter.scala:383)
at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$$anonfun$applyOrElse.apply(CommandRouter.scala:362)
at scala.collection.immutable.List.foreach(List.scala:318)
at com.sap.hana.spark.network.CommandHandler$$anonfun$receive.applyOrElse(CommandRouter.scala:362)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at com.sap.hana.spark.network.CommandHandler.aroundReceive(CommandRouter.scala:204)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
同样奇怪的是这个错误:
16/05/12 07:03:48 INFO CBinder: searching for compat-sap-c++.so at /opt/rh/SAP/lib64/compat-sap-c++.so
16/05/12 07:03:48 WARN CBinder: could not find compat-sap-c++.so
因为我在这个位置有这个文件:
ls /opt/rh/SAP/lib64/
compat-sap-c++.so
将 com.sap.hana.aws.extensions.AWSResolver 更改为 com.sap.hana.spark.aws.extensions.AWSResolver 后,现在日志文件看起来不同了:
16/05/17 10:04:08 INFO CommandHandler: Getting BROWSE data/user/9110494231822270485-5373255807276155190_7e6efa3c-0003-0015-4a91-a3b020000139
16/05/17 10:04:13 INFO CommandHandler: Getting BROWSE data/user/9110494231822270485-5373255807276155196_7e6efa3c-0003-0015-4a91-a3b02000013f
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/17 10:04:29 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO DefaultSource: Creating VoraRelation testtable using an existing catalog table
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO ConfigurableHostMapper: Load Strategy: RELAXEDLOCAL (default)
16/05/17 10:04:29 INFO HdfsBlockRetriever: Length of HDFS file (/user/vora/test.csv): 10 bytes.
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO ConfigurableHostMapper: Load Strategy: RELAXEDLOCAL (default)
16/05/17 10:04:29 INFO TableLoader: Loading table [testtable]
16/05/17 10:04:29 INFO ConfigurableHostMapper: Load Strategy: RELAXEDLOCAL (default)
16/05/17 10:04:29 INFO TableLoader: Initialized 1 loading threads. Waiting until finished... -- 0.00 s
16/05/17 10:04:29 INFO TableLoader: [secondary2.i-a5361638.cluster:2202] Host mapping (Ranges: 1/1 Size: 0.00 MB)
16/05/17 10:04:29 INFO VoraJdbcClient: [secondary2.i-a5361638.cluster:2202] MultiLoad: MULTIFILE
16/05/17 10:04:29 INFO TableLoader: [secondary2.i-a5361638.cluster:2202] Host finished:
Raw ranges: 1/1
Size: 0.00 MB
Time: 0.29 s
Throughput: 0.00 MB/s
16/05/17 10:04:29 INFO TableLoader: Finished 1 loading threads. -- 0.29 s
16/05/17 10:04:29 INFO TableLoader: Updated catalog -- 0.01 s
16/05/17 10:04:29 INFO TableLoader: Table load statistics:
Name: testtable
Size: 0.00 MB
Hosts: 1
Time: 0.30 s
Cluster throughput: 0.00 MB/s
Avg throughput per host: 0.00 MB/s
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO TableLoader: Loaded table [testtable] -- 0.37 s
16/05/17 10:04:38 INFO Utils: freeing the buffer
16/05/17 10:06:43 ERROR RequestOrchestrator: Result set was not fetched by connected Client. Hence cancelled the execution
16/05/17 10:06:43 ERROR RequestOrchestrator: org.apache.spark.SparkException: Job 1 cancelled part of cancelled job group 7e6efa3c-0003-0015-4a91-a3b02000015b
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1229)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled.apply$mcVI$sp(DAGScheduler.scala:681)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled.apply(DAGScheduler.scala:681)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled.apply(DAGScheduler.scala:681)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at org.apache.spark.scheduler.DAGScheduler.handleJobGroupCancelled(DAGScheduler.scala:681)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1475)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
at org.apache.spark.util.EventLoop$$anon.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition.apply(RDD.scala:902)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition.apply(RDD.scala:900)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:900)
at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$$anonfun$applyOrElse.apply(CommandRouter.scala:383)
at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$$anonfun$applyOrElse.apply(CommandRouter.scala:362)
at scala.collection.immutable.List.foreach(List.scala:318)
at com.sap.hana.spark.network.CommandHandler$$anonfun$receive.applyOrElse(CommandRouter.scala:362)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at com.sap.hana.spark.network.CommandHandler.aroundReceive(CommandRouter.scala:204)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
我还是"not fetched by the client",但现在看来vora加载了table。
有人知道如何解决吗?当我尝试读取 Hive tables insted of Vora 时出现相同的错误。
Error: SAP DBTech JDBC: [403]: internal error: Error opening the
cursor for the remote database for query "SELECT
"vora_conn_testtable"."a1", "vora_conn_testtable"."a2",
"vora_conn_testtable"."a3" FROM "spark_velocity"."testtable"
"vora_conn_testtable" LIMIT 200 "
日志显示错误 Result set was not fetched by connected Client. Hence cancelled the execution
。此上下文中的客户端是 HANA 尝试从 Vora 获取。
该错误可能是由 HANA 和 Vora 之间的连接问题引起的。
- hanaes-site.xml 显示
sap.hana.ar.provider=com.sap.hana.aws.extensions.AWSResolver
。这看起来像一个错字。假设您在部署 HANASPARKCTRL00P_5-70001262.RPM
后使用包含在 lib 目录中的 aws.resolver-1.5.8.jar
,正确的路径应该是 com.sap.hana.spark.aws.extensions.AWSResolver
。请参阅 SAP Note 2273047 - SAP HANA Spark Controller SPS 11 (Compatible with Spark 1.5.2) 附带的 PDF 文档
- 确保打开必要的端口:参见HANA Admin Guide -> 9.2.3.3 Spark 控制器配置参数 -> 所有 Spark 执行器节点上的端口 56000-58000
如果问题仍然存在,您可以检查 Spark 执行器日志是否有问题:
- 启动 Spark Controller 并重现 issue/error。
- 导航到位于 http://:8088 的 Yarn ResoureManager UI(Ambari 通过 Ambari -> Yarn -> Quick Links -> Resource Manager 提供了一个 Quick Link UI)
- 在 Yarn ResourceManager UI 中,单击 运行 Spark Controller 应用程序 'Tracking UI' 列中的 'ApplicationMaster' link
- 在 Spark UI 上,单击选项卡 'Executors'。然后对于每个执行程序,单击“stdout”和“stderr”并检查错误
无关:Vora 1.2 已弃用这些参数,您可以从 hanaes-site.xml 中删除它们:spark.vora.hosts、spark.vora.zkurls
我遇到过同样的问题,现在已经解决了!
其原因是 HANA 无法理解工作节点的主机名。
Spark 控制器发送具有 Spark RDD 的 HANA 工作节点名称。
如果 HANA 不理解它们的主机名,HANA 将无法获取结果并发生错误。
请检查 HANA 上的主机文件。
在查看文档数小时后,我终于找到了问题所在。结果是我在 Yarn 配置中缺少一些参数(不知道为什么这会影响 HANA-Vora 连接)。
这是我所做的:
在编辑器中打开 yarn-site.xml 文件或登录 Ambari web UI 和 select Yarn>Config。找到 属性 "yarn.nodemanager.aux-services" 并将 "spark_shuffle" 添加到其当前值。新的 属性 名称应为 "mapreduce_shuffle,spark_shuffle"。
添加或编辑 属性 "yarn.nodemanager.aux-services.spark_shuffle.class",并将其设置为 "org.apache.spark.network.yarn.YarnShuffleService"。
将 spark--yarn-shuffle.jar 文件从 Spark 复制到所有节点管理器主机中的 Hadoop-Yarn class 路径。通常此文件夹位于 /usr/hdp//hadoop-yarn/lib。
重启 Yarn 和节点管理器
我为这个问题苦苦挣扎了几天,这是由于 Spark 控制器上的端口被阻塞造成的。我们在 AWS 上 运行 这个环境,我能够通过更新 Spark 主机的安全组并打开端口 7800-7899 来解决错误,之后 HANA 能够在 SDA 中看到 HIVE 表。
希望有一天这对某人有所帮助:)
!!!更新!!!
在查看文档数小时后,我终于找到了问题所在。原来是我在Yarn配置中少了一些参数
这是我所做的:
- 在编辑器中打开 yarn-site.xml 文件或登录 Ambari web UI 和 select Yarn>Config。 找到 属性 "yarn.nodemanager.aux-services" 并将 "spark_shuffle" 添加到其当前 价值。新的 属性 名称应该是 "mapreduce_shuffle,spark_shuffle".
- 添加或编辑 属性 "yarn.nodemanager.aux-services.spark_shuffle.class",并将其设置为 "org.apache.spark.network.yarn.YarnShuffleService".
- 复制spark--yarn-shuffle.jar文件(在步骤Install Spark Assembly中下载 文件和依赖库)从 Spark 到 Hadoop-Yarn class 所有节点管理器中的路径 主机。通常此文件夹位于 /usr/hdp//hadoop-yarn/lib.
- 重新启动 Yarn 和节点管理器
!!!!!!!!!!!!!
我正在使用 SAP Vora 1.2 Developer Edition 和最新的 Spark Controller (HANASPARKCTRL00P_5-70001262.RPM)。我在 spark-shell 中将 table 加载到 Vora 中。我可以在 "spark_velocity" 文件夹中的 SAP HANA Studio 中看到 table。我可以将 table 加载为虚拟 Table。问题是我无法 select,或预览 table 中的数据,因为错误:
Error: SAP DBTech JDBC: [403]: internal error: Error opening the cursor for the remote database for query "SELECT "SPARK_testtable"."a1", "SPARK_testtable"."a2", "SPARK_testtable"."a3" FROM "spark_velocity"."testtable" "SPARK_testtable" LIMIT 200 "
这是我的hanaes-site.xml文件:
<configuration>
<!-- You can either copy the assembly jar into HDFS or to lib/external directory.
Please maintain appropriate value here-->
<property>
<name>sap.hana.es.spark.yarn.jar</name>
<value>file:///usr/sap/spark/controller/lib/external/spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar</value>
<final>true</final>
</property>
<property>
<name>sap.hana.es.server.port</name>
<value>7860</value>
<final>true</final>
</property>
<!-- Required if you are copying your files into HDFS-->
<property>
<name>sap.hana.es.lib.location</name>
<value>hdfs:///sap/hana/spark/libs/thirdparty/</value>
<final>true</final>
</property>
-->
<!--Required property if using controller for DLM scenarios-->
<!--
<property>
<name>sap.hana.es.warehouse.dir</name>
<value>/sap/hana/hanaes/warehouse</value>
<final>true</final>
</property>
-->
<property>
<name>sap.hana.es.driver.host</name>
<value>ip-10-0-0-[censored].ec2.internal</value>
<final>true</final>
</property>
<!-- Change this value to vora when connecting to Vora store -->
<property>
<name>sap.hana.hadoop.datastore</name>
<value>vora</value>
<final>true</final>
</property>
<!-- // When running against a kerberos protected cluster, please maintain appropriate values
<property>
<name>spark.yarn.keytab</name>
<value>/usr/sap/spark/controller/conf/hanaes.keytab</value>
<final>true</final>
</property>
<property>
<name>spark.yarn.principal</name>
<value>hanaes@PAL.SAP.CORP</value>
<final>true</final>
</property>
-->
<!-- To enable Secure Socket communication, please maintain appropriate values in the follwing section-->
<property>
<name>sap.hana.es.ssl.keystore</name>
<value></value>
<final>false</final>
</property>
<property>
<name>sap.hana.es.ssl.clientauth.required</name>
<value>true</value>
<final>true</final>
</property>
<property>
<name>sap.hana.es.ssl.verify.hostname</name>
<value>true</value>
<final>true</final>
</property>
<property>
<name>sap.hana.es.ssl.keystore.password</name>
<value></value>
<final>true</final>
</property>
<property>
<name>sap.hana.es.ssl.truststore</name>
<value></value>
<final>true</final>
</property>
<property>
<name>sap.hana.es.ssl.truststore.password</name>
<value></value>
<final>true</final>
</property>
<property>
<name>sap.hana.es.ssl.enabled</name>
<value>false</value>
<final>true</final>
</property>
<property>
<name>spark.executor.instances</name>
<value>10</value>
<final>true</final>
</property>
<property>
<name>spark.executor.memory</name>
<value>5g</value>
<final>true</final>
</property>
<!-- Enable the following section if you want to enable dynamic allocation-->
<!--
<property>
<name>spark.dynamicAllocation.enabled</name>
<value>true</value>
<final>true</final>
</property>
<property>
<name>spark.dynamicAllocation.minExecutors</name>
<value>10</value>
<final>true</final>
</property>
<property>
<name>spark.dynamicAllocation.maxExecutors</name>
<value>20</value>
<final>true</final>
</property>
<property>
<name>spark.shuffle.service.enabled</name>
<value>true</value>
<final>true</final>
</property>
<property>
<name>sap.hana.ar.provider</name>
<value>com.sap.hana.aws.extensions.AWSResolver</value>
<final>true</final>
</property>
<property>
<name>spark.vora.hosts</name>
<value>ip-10-0-0-[censored].ec2.internal:2022,ip-10-0-0-[censored].ec2.internal:2022,ip-10-0-0-[censored].ec2.internal:2022</value>
<final>true</final>
</property>
<property>
<name>spark.vora.zkurls</name>
<value>ip-10-0-0-[censored].ec2.internal:2181,ip-10-0-0-[censored].ec2.internal:2181,ip-10-0-0-[censored].ec2.internal:2181</value>
<final>true</final>
</property>
</configuration>
ls /usr/sap/spark/controller/lib/external/
spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar
hdfs dfs -ls /sap/hana/spark/libs/thirdparty
Found 4 items
-rwxrwxrwx 3 hdfs hdfs 366565 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/datanucleus-api-jdo-4.2.1.jar
-rwxrwxrwx 3 hdfs hdfs 2006182 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/datanucleus-core-4.1.2.jar
-rwxrwxrwx 3 hdfs hdfs 1863315 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/datanucleus-rdbms-4.1.2.jar
-rwxrwxrwx 3 hdfs hdfs 627814 2016-05-11 13:09 /sap/hana/spark/libs/thirdparty/joda-time-2.9.3.jar
ls /usr/hdp/
2.3.4.0-3485 2.3.4.7-4 current
vi /var/log/hanaes/hana_controller.log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/sap/spark/controller/lib/spark-sap-datasources-1.2.33-assembly.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/sap/spark/controller/lib/external/spark-assembly-1.5.2.2.3.4.0-3485-hadoop2.7.1.2.3.4.0-3485.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.3.4.0-3485/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/05/12 07:02:38 INFO HanaESConfig: Loaded HANA Extended Store Configuration
Found Spark Libraries. Proceeding with Current Class Path
16/05/12 07:02:39 INFO Server: Starting Spark Controller
16/05/12 07:03:11 INFO CommandRouter: Connecting to Vora Engine
16/05/12 07:03:11 INFO CommandRouter: Initialized Router
16/05/12 07:03:11 INFO CommandRouter: Server started
16/05/12 07:03:43 INFO CommandHandler: Getting BROWSE data/user/17401406272892502037-4985062628452729323_f17e36cf-0003-0015-452e-800c700001ee
16/05/12 07:03:48 INFO CommandHandler: Getting BROWSE data/user/17401406272892502037-4985062628452729329_f17e36cf-0003-0015-452e-800c700001f4
16/05/12 07:03:48 INFO VoraClientFactory: returning a Vora catalog client of this Vora catalog server: master.i-14371789.cluster:2204
16/05/12 07:03:48 INFO CBinder: searching for compat-sap-c++.so at /opt/rh/SAP/lib64/compat-sap-c++.so
16/05/12 07:03:48 WARN CBinder: could not find compat-sap-c++.so
16/05/12 07:03:48 INFO CBinder: searching for libpam.so.0 at /lib64/libpam.so.0
16/05/12 07:03:48 INFO CBinder: loading libpam.so.0 from /lib64/libpam.so.0
16/05/12 07:03:48 INFO CBinder: loading library libprotobuf.so
16/05/12 07:03:48 INFO CBinder: loading library libprotoc.so
16/05/12 07:03:48 INFO CBinder: loading library libtbbmalloc.so
16/05/12 07:03:48 INFO CBinder: loading library libtbb.so
16/05/12 07:03:48 INFO CBinder: loading library libv2runtime.so
16/05/12 07:03:48 INFO CBinder: loading library libv2net.so
16/05/12 07:03:48 INFO CBinder: loading library libv2catalog_connector.so
16/05/12 07:03:48 INFO CatalogFactory: returning a Vora catalog client of this Vora catalog server: master.i-14371789.cluster:2204
16/05/12 07:11:56 INFO CommandHandler: Getting BROWSE data/user/17401406272892502037-4985062628452729335_f17e36cf-0003-0015-452e-800c700001fa
16/05/12 07:11:56 INFO Utils: freeing the buffer
16/05/12 07:11:56 INFO Utils: freeing the buffer
16/05/12 07:12:02 INFO Utils: freeing the buffer
16/05/12 07:12:02 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/12 07:12:02 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/12 07:12:02 INFO CatalogFactory: returning a Vora catalog client of this Vora catalog server: master.i-14371789.cluster:2204
16/05/12 07:12:02 INFO Utils: freeing the buffer
16/05/12 07:12:02 INFO DefaultSource: Creating VoraRelation testtable using an existing catalog table
16/05/12 07:12:02 INFO Utils: freeing the buffer
16/05/12 07:12:11 INFO Utils: freeing the buffer
16/05/12 07:14:15 ERROR RequestOrchestrator: Result set was not fetched by connected Client. Hence cancelled the execution
16/05/12 07:14:15 ERROR RequestOrchestrator: org.apache.spark.SparkException: Job 0 cancelled part of cancelled job group f17e36cf-0003-0015-452e-800c70000216
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1229)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled.apply$mcVI$sp(DAGScheduler.scala:681)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled.apply(DAGScheduler.scala:681)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled.apply(DAGScheduler.scala:681)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at org.apache.spark.scheduler.DAGScheduler.handleJobGroupCancelled(DAGScheduler.scala:681)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1475)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
at org.apache.spark.util.EventLoop$$anon.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition.apply(RDD.scala:902)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition.apply(RDD.scala:900)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:900)
at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$$anonfun$applyOrElse.apply(CommandRouter.scala:383)
at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$$anonfun$applyOrElse.apply(CommandRouter.scala:362)
at scala.collection.immutable.List.foreach(List.scala:318)
at com.sap.hana.spark.network.CommandHandler$$anonfun$receive.applyOrElse(CommandRouter.scala:362)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at com.sap.hana.spark.network.CommandHandler.aroundReceive(CommandRouter.scala:204)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
同样奇怪的是这个错误:
16/05/12 07:03:48 INFO CBinder: searching for compat-sap-c++.so at /opt/rh/SAP/lib64/compat-sap-c++.so
16/05/12 07:03:48 WARN CBinder: could not find compat-sap-c++.so
因为我在这个位置有这个文件:
ls /opt/rh/SAP/lib64/
compat-sap-c++.so
将 com.sap.hana.aws.extensions.AWSResolver 更改为 com.sap.hana.spark.aws.extensions.AWSResolver 后,现在日志文件看起来不同了:
16/05/17 10:04:08 INFO CommandHandler: Getting BROWSE data/user/9110494231822270485-5373255807276155190_7e6efa3c-0003-0015-4a91-a3b020000139
16/05/17 10:04:13 INFO CommandHandler: Getting BROWSE data/user/9110494231822270485-5373255807276155196_7e6efa3c-0003-0015-4a91-a3b02000013f
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:13 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/17 10:04:29 WARN DefaultSource: Creating a Vora Relation that is actually persistent with a temporary statement!
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO DefaultSource: Creating VoraRelation testtable using an existing catalog table
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO ConfigurableHostMapper: Load Strategy: RELAXEDLOCAL (default)
16/05/17 10:04:29 INFO HdfsBlockRetriever: Length of HDFS file (/user/vora/test.csv): 10 bytes.
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO ConfigurableHostMapper: Load Strategy: RELAXEDLOCAL (default)
16/05/17 10:04:29 INFO TableLoader: Loading table [testtable]
16/05/17 10:04:29 INFO ConfigurableHostMapper: Load Strategy: RELAXEDLOCAL (default)
16/05/17 10:04:29 INFO TableLoader: Initialized 1 loading threads. Waiting until finished... -- 0.00 s
16/05/17 10:04:29 INFO TableLoader: [secondary2.i-a5361638.cluster:2202] Host mapping (Ranges: 1/1 Size: 0.00 MB)
16/05/17 10:04:29 INFO VoraJdbcClient: [secondary2.i-a5361638.cluster:2202] MultiLoad: MULTIFILE
16/05/17 10:04:29 INFO TableLoader: [secondary2.i-a5361638.cluster:2202] Host finished:
Raw ranges: 1/1
Size: 0.00 MB
Time: 0.29 s
Throughput: 0.00 MB/s
16/05/17 10:04:29 INFO TableLoader: Finished 1 loading threads. -- 0.29 s
16/05/17 10:04:29 INFO TableLoader: Updated catalog -- 0.01 s
16/05/17 10:04:29 INFO TableLoader: Table load statistics:
Name: testtable
Size: 0.00 MB
Hosts: 1
Time: 0.30 s
Cluster throughput: 0.00 MB/s
Avg throughput per host: 0.00 MB/s
16/05/17 10:04:29 INFO Utils: freeing the buffer
16/05/17 10:04:29 INFO TableLoader: Loaded table [testtable] -- 0.37 s
16/05/17 10:04:38 INFO Utils: freeing the buffer
16/05/17 10:06:43 ERROR RequestOrchestrator: Result set was not fetched by connected Client. Hence cancelled the execution
16/05/17 10:06:43 ERROR RequestOrchestrator: org.apache.spark.SparkException: Job 1 cancelled part of cancelled job group 7e6efa3c-0003-0015-4a91-a3b02000015b
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1229)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled.apply$mcVI$sp(DAGScheduler.scala:681)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled.apply(DAGScheduler.scala:681)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleJobGroupCancelled.apply(DAGScheduler.scala:681)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at org.apache.spark.scheduler.DAGScheduler.handleJobGroupCancelled(DAGScheduler.scala:681)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1475)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
at org.apache.spark.util.EventLoop$$anon.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1921)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition.apply(RDD.scala:902)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition.apply(RDD.scala:900)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:900)
at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$$anonfun$applyOrElse.apply(CommandRouter.scala:383)
at com.sap.hana.spark.network.CommandHandler$$anonfun$receive$$anonfun$applyOrElse.apply(CommandRouter.scala:362)
at scala.collection.immutable.List.foreach(List.scala:318)
at com.sap.hana.spark.network.CommandHandler$$anonfun$receive.applyOrElse(CommandRouter.scala:362)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at com.sap.hana.spark.network.CommandHandler.aroundReceive(CommandRouter.scala:204)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
我还是"not fetched by the client",但现在看来vora加载了table。
有人知道如何解决吗?当我尝试读取 Hive tables insted of Vora 时出现相同的错误。
Error: SAP DBTech JDBC: [403]: internal error: Error opening the cursor for the remote database for query "SELECT "vora_conn_testtable"."a1", "vora_conn_testtable"."a2", "vora_conn_testtable"."a3" FROM "spark_velocity"."testtable" "vora_conn_testtable" LIMIT 200 "
日志显示错误 Result set was not fetched by connected Client. Hence cancelled the execution
。此上下文中的客户端是 HANA 尝试从 Vora 获取。
该错误可能是由 HANA 和 Vora 之间的连接问题引起的。
- hanaes-site.xml 显示
sap.hana.ar.provider=com.sap.hana.aws.extensions.AWSResolver
。这看起来像一个错字。假设您在部署HANASPARKCTRL00P_5-70001262.RPM
后使用包含在 lib 目录中的aws.resolver-1.5.8.jar
,正确的路径应该是com.sap.hana.spark.aws.extensions.AWSResolver
。请参阅 SAP Note 2273047 - SAP HANA Spark Controller SPS 11 (Compatible with Spark 1.5.2) 附带的 PDF 文档
- 确保打开必要的端口:参见HANA Admin Guide -> 9.2.3.3 Spark 控制器配置参数 -> 所有 Spark 执行器节点上的端口 56000-58000
如果问题仍然存在,您可以检查 Spark 执行器日志是否有问题:
- 启动 Spark Controller 并重现 issue/error。
- 导航到位于 http://:8088 的 Yarn ResoureManager UI(Ambari 通过 Ambari -> Yarn -> Quick Links -> Resource Manager 提供了一个 Quick Link UI)
- 在 Yarn ResourceManager UI 中,单击 运行 Spark Controller 应用程序 'Tracking UI' 列中的 'ApplicationMaster' link
- 在 Spark UI 上,单击选项卡 'Executors'。然后对于每个执行程序,单击“stdout”和“stderr”并检查错误
无关:Vora 1.2 已弃用这些参数,您可以从 hanaes-site.xml 中删除它们:spark.vora.hosts、spark.vora.zkurls
我遇到过同样的问题,现在已经解决了! 其原因是 HANA 无法理解工作节点的主机名。 Spark 控制器发送具有 Spark RDD 的 HANA 工作节点名称。 如果 HANA 不理解它们的主机名,HANA 将无法获取结果并发生错误。
请检查 HANA 上的主机文件。
在查看文档数小时后,我终于找到了问题所在。结果是我在 Yarn 配置中缺少一些参数(不知道为什么这会影响 HANA-Vora 连接)。
这是我所做的:
在编辑器中打开 yarn-site.xml 文件或登录 Ambari web UI 和 select Yarn>Config。找到 属性 "yarn.nodemanager.aux-services" 并将 "spark_shuffle" 添加到其当前值。新的 属性 名称应为 "mapreduce_shuffle,spark_shuffle"。 添加或编辑 属性 "yarn.nodemanager.aux-services.spark_shuffle.class",并将其设置为 "org.apache.spark.network.yarn.YarnShuffleService"。 将 spark--yarn-shuffle.jar 文件从 Spark 复制到所有节点管理器主机中的 Hadoop-Yarn class 路径。通常此文件夹位于 /usr/hdp//hadoop-yarn/lib。 重启 Yarn 和节点管理器
我为这个问题苦苦挣扎了几天,这是由于 Spark 控制器上的端口被阻塞造成的。我们在 AWS 上 运行 这个环境,我能够通过更新 Spark 主机的安全组并打开端口 7800-7899 来解决错误,之后 HANA 能够在 SDA 中看到 HIVE 表。
希望有一天这对某人有所帮助:)