Zookeeper/SASL校验和失败

Zookeeper/SASL Checksum failed

如何解决生成此错误的问题:

WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@1040] - Client failed to SASL authenticate: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)]
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)]
    at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:199)
    at org.apache.zookeeper.server.ZooKeeperSaslServer.evaluateResponse(ZooKeeperSaslServer.java:50)

我已经在 AWS EC2 实例上设置了 Zookeeper。我已经概述了设置 Kerberos 和 Zookeeper here 所遵循的步骤。 Zookeeper 似乎在工作:

zookeeper@zookeeper-server-01:~/zk/zookeeper-3.4.11$ JVMFLAGS="-Djava.security.auth.login.config=/home/zookeeper/jaas/jaas.conf -Dsun.security.krb5.debug=true" bin/zkServer.sh start-foreground
...
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbAsRep cons in KrbAsReq.getReply zookeeper/zookeeper-server-01
2017-12-22 00:21:52,308 [myid:] - INFO  [main:Login@297] - Server successfully logged in.
2017-12-22 00:21:52,312 [myid:] - INFO  [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
2017-12-22 00:21:52,313 [myid:] - INFO  [Thread-1:Login@130] - TGT refresh thread started.
2017-12-22 00:21:52,313 [myid:] - INFO  [Thread-1:Login@305] - TGT valid starting at:        Fri Dec 22 00:21:52 UTC 2017
2017-12-22 00:21:52,313 [myid:] - INFO  [Thread-1:Login@306] - TGT expires:                  Fri Dec 22 10:21:52 UTC 2017
2017-12-22 00:21:52,314 [myid:] - INFO  [Thread-1:Login@185] - TGT refresh sleeping until: Fri Dec 22 08:25:59 UTC 2017

但是,当我尝试将 zkCli.sh(在不同的 EC2 实例上的 运行)连接到它时,服务器关闭连接并输出上面的校验和错误。

Zookeeper 客户端似乎能够连接到 Zookeeper 服务器:

JVMFLAGS="-Djava.security.auth.login.config=/home/admin/Downloads/zookeeper-3.4.11/conf/zookeeper-test-client-jaas.conf -Dsun.security.krb5.debug=true" bin/zkCli.sh -server zookeeper-server-01.eigenroute.com:2181
Connecting to zookeeper-server-01.eigenroute.com:2181
2017-12-22 00:27:12,779 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=
3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0, built on 11/01/2017 18:06 GMT
...
2017-12-22 00:27:12,788 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/admin/Downloads/zookeeper-3.4.11
2017-12-22 00:27:12,789 [myid:] - INFO  [main:ZooKeeper@441] - Initiating client connection, connectString=zookeeper-server-01.eigenroute.com:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@1de0aca6
Welcome to ZooKeeper!
JLine support is enabled
...
>>> KrbAsReq creating message
[zk: zookeeper-server-01.eigenroute.com:2181(CONNECTING) 0] >>> KrbKdcReq send: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000, number of retries =3, #bytes=166
>>> KDCCommunication: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000,Attempt =1, #bytes=166
>>> KrbKdcReq send: #bytes read=310
>>>Pre-Authentication Data:
...

客户端收到关于需要预授权的错误,但似乎已成功登录(这是否意味着成功验证?)到...Zookeeper 服务器?或登录到 Kerberos?:

...
KRBError received: NEEDED_PREAUTH
KrbAsReqBuilder: PREAUTH FAILED/REQ, re-send AS-REQ
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 18 17 16 23.
Looking for keys for: zktestclient/eigenroute.com@EIGENROUTE.COM
Added key: 17version: 3
Added key: 18version: 3
Looking for keys for: zktestclient/eigenroute.com@EIGENROUTE.COM
Added key: 17version: 3
Added key: 18version: 3
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 18 17 16 23.
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbAsReq creating message
>>> KrbKdcReq send: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000, number of retries =3, #bytes=253
>>> KDCCommunication: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000,Attempt =1, #bytes=253
>>> KrbKdcReq send: #bytes read=742
>>> KdcAccessibility: remove kerberos-server-01.eigenroute.com
Looking for keys for: zktestclient/eigenroute.com@EIGENROUTE.COM
Added key: 17version: 3
Added key: 18version: 3
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbAsRep cons in KrbAsReq.getReply zktestclient/eigenroute.com
2017-12-22 00:27:13,286 [myid:] - INFO  [main-SendThread(35.169.37.216:2181):Login@297] - Client successfully logged in.
...

客户端然后打开到 Zookeeper 服务器的套接字连接,并尝试对其进行 SASL 身份验证:

...
2017-12-22 00:27:13,312 [myid:] - INFO  [main-SendThread(35.169.37.216:2181):ClientCnxn$SendThread@103
5] - Opening socket connection to server 35.169.37.216/35.169.37.216:2181. Will attempt to SASL-authen
ticate using Login Context section 'Client'
2017-12-22 00:27:13,317 [myid:] - INFO  [main-SendThread(35.169.37.216:2181):ClientCnxn$SendThread@877
] - Socket connection established to 35.169.37.216/35.169.37.216:2181, initiating session
2017-12-22 00:27:13,359 [myid:] - INFO  [main-SendThread(35.169.37.216:2181):ClientCnxn$SendThread@1302] - Session establishment complete on server 35.169.37.216/35.169.37.216:2181, sessionid = 0x1000436873a0001, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
Found ticket for zktestclient/eigenroute.com@EIGENROUTE.COM to go to krbtgt/EIGENROUTE.COM@EIGENROUTE.
COM expiring on Fri Dec 22 10:27:13 UTC 2017
Entered Krb5Context.initSecContext with state=STATE_NEW
Found ticket for zktestclient/eigenroute.com@EIGENROUTE.COM to go to krbtgt/EIGENROUTE.COM@EIGENROUTE.
COM expiring on Fri Dec 22 10:27:13 UTC 2017
Service ticket not found in the subject
>>> Credentials acquireServiceCreds: same realm
Using builtin default etypes for default_tgs_enctypes
default etypes for default_tgs_enctypes: 18 17 16 23.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbKdcReq send: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000, number of retries =3, #bytes=712
>>> KDCCommunication: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000,Attempt =1, #bytes=712
>>> KrbKdcReq send: #bytes read=678
>>> KdcAccessibility: remove kerberos-server-01.eigenroute.com
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbApReq: APOptions are 00000000 00000000 00000000 00000000
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
Krb5Context setting mySeqNumber to: 50687702
Krb5Context setting peerSeqNumber to: 0
Created InitSecContextToken:
0000: 01 00 6E 82 02 6B 30 82   02 67 A0 03 02 01 05 A1  ..n..k0..g......
...
0260: 33 25 94 1F 60 93 E9 CF   7E EF 15 82 F8 6D ED 06  3%..`........m..
0270: 43                                                 C

2017-12-22 00:27:13,405 [myid:] - INFO  [main-SendThread(35.169.37.216:2181):ClientCnxn$SendThread@1161] - Unable to read additional data from server sessionid 0x1000436873a0001, likely server has closed socket, closing socket connection and attempting reconnect

WATCHER::

WatchedEvent state:Disconnected type:None path:null

所以 SASL 身份验证不是完全失败,而是 Zookeeper 服务器关闭了连接(由于校验和失败)。

更新#1。针对T-Heron的评论,客户端机器上nslookup zookeeper-server-01.eigenroute.com的结果是:

Server:     172.31.0.2
Address:    172.31.0.2#53

Non-authoritative answer:
Name:   zookeeper-server-01.eigenroute.com
Address: 35.169.37.216

zookeeper-server-01.eigenroute.com 的 DNS 条目是:

zookeeper-server-01.eigenroute.com  30 minutes  A       
35.169.37.216

在客户端机器上,/etc/hosts 包含:

127.0.1.1 ip-172-31-95-211.ec2.internal ip-172-31-95-211
127.0.0.1 localhost
34.239.197.36 kerberos-server-02

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

kerberos-server-02 命名错误,它不是 KDC,当我注释掉这一行时结果是一样的)并且在 ZooKeeper 服务器上,zookeeper-server-01.eigenroute.com/etc/hosts 包含:

127.0.1.1 ip-172-31-88-14.ec2.internal ip-172-31-88-14
127.0.0.1 localhost
34.225.180.212 kerberos-server-01

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

kerberos-server-01 的条目不需要存在 - 当我删除它时,结果是一样的)。

谁能解释一下如何解决校验和失败?谢谢!

我的 KDC 具有以下原则:

zookeeper/35.169.37.216@EIGENROUTE.COM
zookeeper/zookeeper-server-01.eigenroute.com@EIGENROUTE.COM

在主机名为 zookeeper-server-01.eigenroute.com 的 ZooKeeper 服务器的 JAAS 配置中,我使用了为 zookeeper/zookeeper-server-01.eigenroute.com@EIGENROUTE.COM 创建的密钥表。

当我为 zookeeper/35.169.37.216@EIGENROUTE.COM 创建密钥表并在 ZooKeeper 服务器的 JAAS 配置中使用此密钥表时,一切正常 - 来自客户端的 SASL 身份验证成功。

我宁愿在 Kerberos 主体的名称中使用完全限定的域名 (zookeeper-server-01.eigenroute.com),而不是 IP 地址。如果有人能告诉我如何让它工作,我会接受它作为答案。到那时,这就足够了。

更新:我明白了。 Zookeeper 客户端从 -server 参数中获取 FQDN,查找此 FQDN 的 IP 地址,并从中创建一个 InetSocketAddress 对象 (org.apache.zookeeper.client.StaticHostProvider)。然后为了获取主机名,它调用 .getHostName (org.apache.zookeeper.ClientCnxn.SendThread.startConnect)。在我的本地机器上,这个 returns:

ec2-35-169-37-216.compute-1.amazonaws.com

在我的客户端 AWS EC2 实例上,这个 returns:

35.169.37.216

而我期望它是 return FQDN。这就是为什么在我的 AWS EC2 客户端机器上,ZooKeeper 客户端尝试获取一张票:

zookeeper/35.169.37.216@EIGENROUTE.COM

在我的本地机器上,ZooKeeper 客户端尝试获取一张票:

zookeeper/ec2-35-169-37-216.compute-1.amazonaws.com@EIGENROUTE.COM

所以我需要 AWS 来确保对 35.169.37.216 的反向 DNS 查找产生 zookeeper-server-01.eigenroute.com。到目前为止我找到的解决方案是 to ask AWS to set up the mapping for the reverse DNS.

理想情况下,ZooKeeper 可以选择跳过此反向 DNS 查找并仅使用 FQDN 作为主机名(也许确实如此,但我还没有找到)。