在 Kerberized 集群中读取 Spark 应用程序中的 HDFS 文件
Read HDFS file in Spark Application in Kerberized Cluster
我使用 Hortonworks Data Platform 2.5 设置了一个 Hadoop 集群,其中还包括 Ambari 2.4、Kerberos、Spark 1.6.2 和 HDFS。
我有以下用户的 Kerberos 主体和密钥表:
- spark(在启用 Kerberos 期间由 Ambari 创建)
- hdfsuserA(由 kadmin 创建 -> add_principle)
用户 spark
需要在安全集群中 运行 spark-submit
命令,Spark 应用程序必须打开 HDFS 目录 /user/hdfsuserA/...
中的一些文件,即由 hdfsuserA (700) 所有。
自从我启用了 Kerberos,我的 Spark 应用程序将不再 运行,它失败并出现以下异常
[Stage 1:> (0 + 92) / 162]Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 in stage 1.0 failed 4 times, most recent failure: Lost task 55.3 in stage 1.0 (TID 225, had-data1): org.apache.hadoop.security.AccessControlException: Permission denied: user=spark, access=EXECUTE, inode="/user/hdfsuserA/new/data/Export_PDM_Hadoop_05_2016.csv":hdfsuserA:hadoop:drwx------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1827)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1811)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1785)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1862)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1831)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1744)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:693)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:373)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
问题是,我通过用户 spark
进行身份验证,以便能够启动 Spark 应用程序,但在应用程序内部,我遇到异常,因为 /user/hdfsuserA
HDFS 目录无法访问spark 用户。
当我 运行 用户 hdfsuserA
的 spark-submit 命令时,我得到:
[hdfsuserA@had-job ~]$ kinit -kt /etc/security/keytabs/hdfsuserA.keytab hdfsuserA
[hdfsuserA@had-job ~]$ spark-submit --class spark.sales.TestAnalysis --master yarn --deploy-mode client /home/hdfsuserA/application_new.jar hdfs://had-job:8020/user/hdfsuserA/new/data/*
16/12/03 09:44:46 INFO Remoting: Starting remoting
16/12/03 09:44:46 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@141.79.71.34:46996]
spark.yarn.driver.memoryOverhead is set but does not apply in client mode.
spark.driver.cores is set but does not apply in client mode.
16/12/03 09:44:49 INFO metastore: Trying to connect to metastore with URI thrift://had-job:9083
16/12/03 09:44:49 INFO metastore: Connected to metastore.
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:122)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at myutil.SparkContextFactory.createSparkContext(SparkContextFactory.java:34)
at spark.sales.BasketBasedSalesAnalysis.main(BasketBasedSalesAnalysis.java:46)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
这种问题的正确解决方法是什么?我可以吗? kinit
应用内的另一个用户?
我发现了问题:是用户问题!因为我只在集群的 NameNode 主机上创建了 hdfsuserA
运行 spark-submit
命令,应用程序无法通过其他主机上的 keytabs 作为该用户进行身份验证。
所以要解决这个问题:在集群的所有主机上添加相同的用户:
sudo useradd hdfsuserA
sudo passwd hdfsuserA
之后调用 spark 应用程序应该可以工作(使用 spark-submit
中的 master yarn
参数,使用 master local[x]
它始终有效)!
我使用 Hortonworks Data Platform 2.5 设置了一个 Hadoop 集群,其中还包括 Ambari 2.4、Kerberos、Spark 1.6.2 和 HDFS。
我有以下用户的 Kerberos 主体和密钥表:
- spark(在启用 Kerberos 期间由 Ambari 创建)
- hdfsuserA(由 kadmin 创建 -> add_principle)
用户 spark
需要在安全集群中 运行 spark-submit
命令,Spark 应用程序必须打开 HDFS 目录 /user/hdfsuserA/...
中的一些文件,即由 hdfsuserA (700) 所有。
自从我启用了 Kerberos,我的 Spark 应用程序将不再 运行,它失败并出现以下异常
[Stage 1:> (0 + 92) / 162]Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 in stage 1.0 failed 4 times, most recent failure: Lost task 55.3 in stage 1.0 (TID 225, had-data1): org.apache.hadoop.security.AccessControlException: Permission denied: user=spark, access=EXECUTE, inode="/user/hdfsuserA/new/data/Export_PDM_Hadoop_05_2016.csv":hdfsuserA:hadoop:drwx------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1827)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1811)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1785)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1862)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1831)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1744)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:693)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:373)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
问题是,我通过用户 spark
进行身份验证,以便能够启动 Spark 应用程序,但在应用程序内部,我遇到异常,因为 /user/hdfsuserA
HDFS 目录无法访问spark 用户。
当我 运行 用户 hdfsuserA
的 spark-submit 命令时,我得到:
[hdfsuserA@had-job ~]$ kinit -kt /etc/security/keytabs/hdfsuserA.keytab hdfsuserA
[hdfsuserA@had-job ~]$ spark-submit --class spark.sales.TestAnalysis --master yarn --deploy-mode client /home/hdfsuserA/application_new.jar hdfs://had-job:8020/user/hdfsuserA/new/data/*
16/12/03 09:44:46 INFO Remoting: Starting remoting
16/12/03 09:44:46 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@141.79.71.34:46996]
spark.yarn.driver.memoryOverhead is set but does not apply in client mode.
spark.driver.cores is set but does not apply in client mode.
16/12/03 09:44:49 INFO metastore: Trying to connect to metastore with URI thrift://had-job:9083
16/12/03 09:44:49 INFO metastore: Connected to metastore.
Exception in thread "main" org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:122)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:530)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at myutil.SparkContextFactory.createSparkContext(SparkContextFactory.java:34)
at spark.sales.BasketBasedSalesAnalysis.main(BasketBasedSalesAnalysis.java:46)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
这种问题的正确解决方法是什么?我可以吗? kinit
应用内的另一个用户?
我发现了问题:是用户问题!因为我只在集群的 NameNode 主机上创建了 hdfsuserA
运行 spark-submit
命令,应用程序无法通过其他主机上的 keytabs 作为该用户进行身份验证。
所以要解决这个问题:在集群的所有主机上添加相同的用户:
sudo useradd hdfsuserA
sudo passwd hdfsuserA
之后调用 spark 应用程序应该可以工作(使用 spark-submit
中的 master yarn
参数,使用 master local[x]
它始终有效)!