GSSException:在将 Polybase 与 Kerberos 连接时未提供有效凭据(机制级别:无法找到任何 Kerberos tgt)
GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) while connecting Polybase with Kerberos
我们想通过 Polybase 将我们的 SQL Server 2016 Enterprise 与我们的 Kerberized OnPrem Hadoop-Cluster 与 Cloudera 5.14 连接起来。
我按照 Microsoft PolyBase Guide 配置了 Polybase。在这个主题上工作了几天后,由于异常,我无法继续:javax.security.sasl.SaslException:GSS 启动失败 [由 GSSException 引起:未提供有效凭据(机制级别:找不到任何Kerberos tgt)]
Microsoft 有一个针对 troubleshooting the connectivity with PolyBase and Kerberos 的内置诊断工具。在 Microsoft 的这份故障排除指南中,有 4 个检查点,我被困在检查点 4 上。
关于检查点的简短信息(我成功的地方):
- 检查点 1:成功! 通过 KDC 验证并收到 TGT
- 检查点 2:成功!关于故障排除指南,PolyBase 将尝试访问 HDFS 并失败,因为请求不包含必要的服务票证。
- 检查点 3:成功! 第二个十六进制转储表明 SQL 服务器成功使用了 TGT 并从 KDC 获取了名称节点 SPN 的适用服务票证.
- 检查点 4:未成功SQL服务器已通过 Hadoop 使用 ST(服务票证)进行身份验证,并授予会话访问安全资源。
krb5.conf 文件
[libdefaults]
default_realm = COMPANY.REALM.COM
dns_lookup_kdc = false
dns_lookup_realm = false
ticket_lifetime = 86400
renew_lifetime = 604800
forwardable = true
default_tgs_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
default_tkt_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
permitted_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
udp_preference_limit = 1
kdc_timeout = 3000
[realms]
COMPANY.REALM.COM = {
kdc = ipadress.kdc.host
admin_server = ipadress.kdc.host
}
[logging]
default = FILE:/var/log/krb5/kdc.log
kdc = FILE:/var/log/krb5/kdc.log
admin_server = FILE:/var/log/krb5/kadmind.log
core-site.xml 用于 SQL-Server
上的 Polybase
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>ipc.client.connect.max.retries</name>
<value>2</value>
</property>
<property>
<name>ipc.client.connect.max.retries.on.timeouts</name>
<value>2</value>
</property>
<!-- kerberos security information, PLEASE FILL THESE IN ACCORDING TO HADOOP CLUSTER CONFIG -->
<property>
<name>polybase.kerberos.realm</name>
<value>COMPANY.REALM.COM</value>
</property>
<property>
<name>polybase.kerberos.kdchost</name>
<value>ipadress.kdc.host</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>KERBEROS</value>
</property>
</configuration>
hdfs-site.xml 用于 SQL-Server
上的 Polybase
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.block.size</name>
<value>268435456</value>
</property>
<!-- Client side file system caching is disabled below for credential refresh and
settting the below cache disabled options to true might result in
stale credentials when an alter credential or alter datasource is performed
-->
<property>
<name>fs.wasb.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.wasbs.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.asv.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.asvs.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
<!-- kerberos security information, PLEASE FILL THESE IN ACCORDING TO HADOOP CLUSTER CONFIG -->
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST@COMPANY.REALM.COM</value>
</property>
</configuration>
Polybase 异常
[2018-06-22 12:51:50,349] WARN 2872[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:53,568] WARN 6091[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:56,127] WARN 8650[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:58,998] WARN 11521[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:59,139] WARN 11662[main] - org.apache.hadoop.ipc.Client$Connection.run(Client.java:676) - Couldn't setup connection for hdfs@COMPANY.REALM.COM to IPADRESS_OF_NAMENODE:8020
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
NameNode 上的日志条目
Socket Reader #1 for port 8020: readAndProcess from client IP-ADRESS_SQL-SERVER threw exception [javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: AES128 CTS mode with HMAC SHA1-96 encryption type not in permitted_enctypes list)]]
Auth failed for IP-ADRESS_SQL-SERVER:60484:null (GSS initiate failed) with true cause: (GSS initiate failed)
让我感到困惑的部分是来自我们的 NameNode 的日志条目,因为 AES128 CTS 模式与 HMAC SHA1-96 已经在允许的 enctypes 列表中,如 [=63] 所示=] 和 Cloudera Manager UI
感谢您的帮助!
在我们重新启动集群后,问题已自行解决。
我认为问题是由于某些 运行 服务,我们的 Hadoop-Cluster 中的 krb5.conf 文件无法分布在所有节点上。 Cloudera Manager 中也有关于 Kerberos 的陈旧配置的警告。
非常感谢大家!
我们想通过 Polybase 将我们的 SQL Server 2016 Enterprise 与我们的 Kerberized OnPrem Hadoop-Cluster 与 Cloudera 5.14 连接起来。
我按照 Microsoft PolyBase Guide 配置了 Polybase。在这个主题上工作了几天后,由于异常,我无法继续:javax.security.sasl.SaslException:GSS 启动失败 [由 GSSException 引起:未提供有效凭据(机制级别:找不到任何Kerberos tgt)]
Microsoft 有一个针对 troubleshooting the connectivity with PolyBase and Kerberos 的内置诊断工具。在 Microsoft 的这份故障排除指南中,有 4 个检查点,我被困在检查点 4 上。 关于检查点的简短信息(我成功的地方):
- 检查点 1:成功! 通过 KDC 验证并收到 TGT
- 检查点 2:成功!关于故障排除指南,PolyBase 将尝试访问 HDFS 并失败,因为请求不包含必要的服务票证。
- 检查点 3:成功! 第二个十六进制转储表明 SQL 服务器成功使用了 TGT 并从 KDC 获取了名称节点 SPN 的适用服务票证.
- 检查点 4:未成功SQL服务器已通过 Hadoop 使用 ST(服务票证)进行身份验证,并授予会话访问安全资源。
krb5.conf 文件
[libdefaults]
default_realm = COMPANY.REALM.COM
dns_lookup_kdc = false
dns_lookup_realm = false
ticket_lifetime = 86400
renew_lifetime = 604800
forwardable = true
default_tgs_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
default_tkt_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
permitted_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96
udp_preference_limit = 1
kdc_timeout = 3000
[realms]
COMPANY.REALM.COM = {
kdc = ipadress.kdc.host
admin_server = ipadress.kdc.host
}
[logging]
default = FILE:/var/log/krb5/kdc.log
kdc = FILE:/var/log/krb5/kdc.log
admin_server = FILE:/var/log/krb5/kadmind.log
core-site.xml 用于 SQL-Server
上的 Polybase<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>ipc.client.connect.max.retries</name>
<value>2</value>
</property>
<property>
<name>ipc.client.connect.max.retries.on.timeouts</name>
<value>2</value>
</property>
<!-- kerberos security information, PLEASE FILL THESE IN ACCORDING TO HADOOP CLUSTER CONFIG -->
<property>
<name>polybase.kerberos.realm</name>
<value>COMPANY.REALM.COM</value>
</property>
<property>
<name>polybase.kerberos.kdchost</name>
<value>ipadress.kdc.host</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>KERBEROS</value>
</property>
</configuration>
hdfs-site.xml 用于 SQL-Server
上的 Polybase<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.block.size</name>
<value>268435456</value>
</property>
<!-- Client side file system caching is disabled below for credential refresh and
settting the below cache disabled options to true might result in
stale credentials when an alter credential or alter datasource is performed
-->
<property>
<name>fs.wasb.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.wasbs.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.asv.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.asvs.impl.disable.cache</name>
<value>true</value>
</property>
<property>
<name>fs.hdfs.impl.disable.cache</name>
<value>true</value>
</property>
<!-- kerberos security information, PLEASE FILL THESE IN ACCORDING TO HADOOP CLUSTER CONFIG -->
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST@COMPANY.REALM.COM</value>
</property>
</configuration>
Polybase 异常
[2018-06-22 12:51:50,349] WARN 2872[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:53,568] WARN 6091[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:56,127] WARN 8650[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:58,998] WARN 11521[main] - org.apache.hadoop.security.UserGroupInformation.hasSufficientTimeElapsed(UserGroupInformation.java:1156) - Not attempting to re-login since the last re-login was attempted less than 600 seconds before.
[2018-06-22 12:51:59,139] WARN 11662[main] - org.apache.hadoop.ipc.Client$Connection.run(Client.java:676) - Couldn't setup connection for hdfs@COMPANY.REALM.COM to IPADRESS_OF_NAMENODE:8020
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
NameNode 上的日志条目
Socket Reader #1 for port 8020: readAndProcess from client IP-ADRESS_SQL-SERVER threw exception [javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: AES128 CTS mode with HMAC SHA1-96 encryption type not in permitted_enctypes list)]]
Auth failed for IP-ADRESS_SQL-SERVER:60484:null (GSS initiate failed) with true cause: (GSS initiate failed)
让我感到困惑的部分是来自我们的 NameNode 的日志条目,因为 AES128 CTS 模式与 HMAC SHA1-96 已经在允许的 enctypes 列表中,如 [=63] 所示=] 和 Cloudera Manager UI
感谢您的帮助!
在我们重新启动集群后,问题已自行解决。 我认为问题是由于某些 运行 服务,我们的 Hadoop-Cluster 中的 krb5.conf 文件无法分布在所有节点上。 Cloudera Manager 中也有关于 Kerberos 的陈旧配置的警告。 非常感谢大家!