kerberos 环境的 Oozie Spark 操作失败
Oozie Spark action failed for kerberos environment
我正在 运行通过 oozie spark 操作执行 spark 作业。 spark 作业使用 hivecontext 来执行一些要求。集群配置了 kerberos。当我使用 spark-submit 表单控制台提交作业时,运行ning 成功。但是当我 运行 来自 oozie 的作业时,出现以下错误。
18/03/18 03:34:16 INFO metastore: Trying to connect to metastore with URI thrift://localhost.local:9083
18/03/18 03:34:16 ERROR TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.5" name="workflow">
<start to="analysis" />
<!-- Bash script to do the spark-submit. The version numbers of these actions are magic. -->
<action name="Analysis">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>${master}</master>
<name>Analysis</name>
<class>com.demo.analyzer</class>
<jar>${appLib}</jar>
<spark-opts>--jars ${sparkLib} --files ${config},${hivesite} --num-executors ${NoOfExecutors} --executor-cores ${ExecutorCores} --executor-memory ${ExecutorMemory} --driver-memory ${driverMemory}</spark-opts>
</spark>
<ok to="sendEmail" />
<error to="fail" />
</action>
<action name="sendEmail">
<email xmlns="uri:oozie:email-action:0.1">
<to>${emailToAddress}</to>
<subject>Output of workflow ${wf:id()}</subject>
<body>Results from line count: ${wf:actionData('shellAction')['NumberOfLines']}</body>
</email>
<ok to="end" />
<error to="end" />
</action>
<!-- You wish you'd ever get Oozie errors. -->
<kill name="fail">
<message>Bash action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end" />
</workflow-app>
我是否需要在 workflow.xml 中配置与 Kerberos 相关的任何内容?我在这里遗漏了什么吗?
感谢任何帮助。
提前致谢。
在服务器中上传您的密钥表,然后将此密钥表文件和凭据作为您工作流程中 spark-opts 中的参数。让我知道它是否有效。谢谢
<spark-opts>--keytab nagendra.keytab --principal "nagendra@domain.com"
--jars ${sparkLib} --files ${config},${hivesite} --num-executors ${NoOfExecutors} --executor-cores ${ExecutorCores} --executor-memory
${ExecutorMemory} --driver-memory ${driverMemory}</spark-opts>
您需要在 oozie 工作流中为 thrift uri 添加 hcat 凭据。这将启用使用 Kerberos 的 thrift URI 的 Metastore 成功身份验证。
在 oozie 工作流中的凭据标记下方添加。
<credentials>
<credential name="credhive" type="hcat">
<property>
<name>hcat.metastore.uri</name>
<value>${thrift_uri}</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>${principal}</value>
</property>
</credential>
</credentials>
并向 spark
操作提供凭证,如下所示:
<action name="Analysis" cred="credhive">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>${master}</master>
<name>Analysis</name>
<class>com.demo.analyzer</class>
<jar>${appLib}</jar>
<spark-opts>--jars ${sparkLib} --files ${config},${hivesite} --num-executors ${NoOfExecutors} --executor-cores ${ExecutorCores} --executor-memory ${ExecutorMemory} --driver-memory ${driverMemory}</spark-opts>
</spark>
<ok to="sendEmail" />
<error to="fail" />
</action>
thrift_uri
和principal
可以在hive-site.xml
中找到。 thrift_uri 将设置在 hive-site.xml 属性:
<property>
<name>hive.metastore.uris</name>
<value>thrift://xxxxxx:9083</value>
</property>
此外,主体将设置在 hive-site.xml 属性:
<property>
<name>hive.metastore.kerberos.principal</name>
<value>hive/_HOST@domain.COM</value>
</property>
我正在 运行通过 oozie spark 操作执行 spark 作业。 spark 作业使用 hivecontext 来执行一些要求。集群配置了 kerberos。当我使用 spark-submit 表单控制台提交作业时,运行ning 成功。但是当我 运行 来自 oozie 的作业时,出现以下错误。
18/03/18 03:34:16 INFO metastore: Trying to connect to metastore with URI thrift://localhost.local:9083
18/03/18 03:34:16 ERROR TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.5" name="workflow">
<start to="analysis" />
<!-- Bash script to do the spark-submit. The version numbers of these actions are magic. -->
<action name="Analysis">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>${master}</master>
<name>Analysis</name>
<class>com.demo.analyzer</class>
<jar>${appLib}</jar>
<spark-opts>--jars ${sparkLib} --files ${config},${hivesite} --num-executors ${NoOfExecutors} --executor-cores ${ExecutorCores} --executor-memory ${ExecutorMemory} --driver-memory ${driverMemory}</spark-opts>
</spark>
<ok to="sendEmail" />
<error to="fail" />
</action>
<action name="sendEmail">
<email xmlns="uri:oozie:email-action:0.1">
<to>${emailToAddress}</to>
<subject>Output of workflow ${wf:id()}</subject>
<body>Results from line count: ${wf:actionData('shellAction')['NumberOfLines']}</body>
</email>
<ok to="end" />
<error to="end" />
</action>
<!-- You wish you'd ever get Oozie errors. -->
<kill name="fail">
<message>Bash action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end" />
</workflow-app>
我是否需要在 workflow.xml 中配置与 Kerberos 相关的任何内容?我在这里遗漏了什么吗?
感谢任何帮助。
提前致谢。
在服务器中上传您的密钥表,然后将此密钥表文件和凭据作为您工作流程中 spark-opts 中的参数。让我知道它是否有效。谢谢
<spark-opts>--keytab nagendra.keytab --principal "nagendra@domain.com"
--jars ${sparkLib} --files ${config},${hivesite} --num-executors ${NoOfExecutors} --executor-cores ${ExecutorCores} --executor-memory
${ExecutorMemory} --driver-memory ${driverMemory}</spark-opts>
您需要在 oozie 工作流中为 thrift uri 添加 hcat 凭据。这将启用使用 Kerberos 的 thrift URI 的 Metastore 成功身份验证。
在 oozie 工作流中的凭据标记下方添加。
<credentials>
<credential name="credhive" type="hcat">
<property>
<name>hcat.metastore.uri</name>
<value>${thrift_uri}</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>${principal}</value>
</property>
</credential>
</credentials>
并向 spark
操作提供凭证,如下所示:
<action name="Analysis" cred="credhive">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<master>${master}</master>
<name>Analysis</name>
<class>com.demo.analyzer</class>
<jar>${appLib}</jar>
<spark-opts>--jars ${sparkLib} --files ${config},${hivesite} --num-executors ${NoOfExecutors} --executor-cores ${ExecutorCores} --executor-memory ${ExecutorMemory} --driver-memory ${driverMemory}</spark-opts>
</spark>
<ok to="sendEmail" />
<error to="fail" />
</action>
thrift_uri
和principal
可以在hive-site.xml
中找到。 thrift_uri 将设置在 hive-site.xml 属性:
<property>
<name>hive.metastore.uris</name>
<value>thrift://xxxxxx:9083</value>
</property>
此外,主体将设置在 hive-site.xml 属性:
<property>
<name>hive.metastore.kerberos.principal</name>
<value>hive/_HOST@domain.COM</value>
</property>