试图摆脱 heketi 中的孤立卷会导致无缘无故的错误

Trying to get rid of orphan volumes in heketi results in error without reason

我正试图摆脱 heketi 中的一堆孤立卷。当我尝试时,我得到 "Error",然后是有关我刚刚尝试删除的卷的信息,序列化为 JSON。没有别的了。我试图深入研究日志,但他们没有透露任何信息。

这是我用来尝试删除卷的命令:

heketi-cli -s $HEKETI_CLI_SERVER --user admin --secret ${KEY}  volume delete 22f1a960651f0f16ada20a15d68c7dd6
Error: {"size":30,"name":"vol_22f1a960651f0f16ada20a15d68c7dd6","durability":{"type":"none","replicate":{},"disperse":{}},"gid":2008,"glustervolumeoptions":["","cluster.post-op-delay-secs 0"," performance.client-io-threads off"," performance.open-behind off"," performance.readdir-ahead off"," performance.read-ahead off"," performance.stat-prefetch off"," performance.write-behind off"," performance.io-cache off"," cluster.consistent-metadata on"," performance.quick-read off"," performance.strict-o-direct on"," storage.health-check-interval 0",""],"snapshot":{"enable":true,"factor":1},"id":"22f1a960651f0f16ada20a15d68c7dd6","cluster":"e924a50aa93d9eae1132c60eb1f36310","mount":{"glusterfs":{"hosts":["<SECRET>"],"device":"<SECRET>:vol_22f1a960651f0f16ada20a15d68c7dd6","options":{"backup-volfile-servers":""}}},"blockinfo":{},"bricks":[{"id":"0f4c6d7f605e9368bfe3dc7cc117b69a","path":"/var/lib/heketi/mounts/vg_970f0faf60f8dfc6f6a0d6bd25bdea7c/brick_0f4c6d7f605e9368bfe3dc7cc117b69a/brick","device":"970f0faf60f8dfc6f6a0d6bd25bdea7c","node":"107894a855c9d2c34509b18272e6c298","volume":"22f1a960651f0f16ada20a15d68c7dd6","size":31457280}]}

请注意,第二行仅包含错误,然后是有关序列化为 json 的卷的信息。

该卷在 gluster 中不存在。我使用以下命令验证该卷不再存在:

kubectl -n default  exec -t -i glusterfs-rgz9g bash
gluster volume info 
<shows volume i did not delete>

Kubernetes 不显示 PersistentVolumeClaim 或 PersistentVolume:

kubectl get pvc -A
No resources found.
kubectl get pv -A
No resources found.

我试着查看 heketi 日志,但它只报告了卷的 GET

kubectl -n default logs   heketi-56f678775c-nrbwd
[negroni] 2019-11-25T21:29:19Z | 200 |   1.407715ms | <SECRET>:8080 | GET /volumes/22f1a960651f0f16ada20a15d68c7dd6
[negroni] 2019-11-25T21:29:19Z | 200 |   1.111984ms | <SECRET>:8080 | GET /volumes/22f1a960651f0f16ada20a15d68c7dd6
[negroni] 2019-11-25T21:29:19Z | 200 |   1.540357ms | <SECRET>:8080 | GET /volumes/22f1a960651f0f16ada20a15d68c7dd6 

我试过设置更详细的日志级别,但设置不成立:

heketi-cli -s $HEKETI_CLI_SERVER --user admin --secret ${KEY}  loglevel set debug
Server log level updated
heketi-cli -s $HEKETI_CLI_SERVER --user admin --secret ${KEY}  loglevel get
info

我的 CLI 使用

heketi-cli -v
heketi-cli v9.0.0

Heketi服务器是运行:

kubectl -n default  exec -t -i  heketi-56f678775c-nrbwd bash
heketi -v
Heketi v9.0.0-124-gc2e2a4ab

根据日志,我认为 heketi-cli 有问题,然后实际上从未向 heketi 服务器发送 POST 或 DELETE 请求。

我该如何进行调试?在这一点上,我唯一的解决方法是重新创建我的集群,但我想避免这种情况,尤其是当类似的事情再次发生时。

看起来 heketi-cli 中存在错误,因为如果我使用 ruby 和 curl 手动制作请求,我可以删除卷:

TOKEN=$(ruby makeToken.rb DELETE /volumes/22f1a960651f0f16ada20a15d68c7dd6)
curl -X DELETE -H "Authorization: Bearer $TOKEN" http://10.233.21.178:8080/volumes/22f1a960651f0f16ada20a15d68c7dd6

请参阅https://github.com/heketi/heketi/blob/master/docs/api/api.md#authentication-model了解如何生成 jwt 令牌。

我手动创建了请求,希望得到命令行工具吞下的更好的错误消息。原来 cli 实际上被破坏了。

ruby 用于制作 jwt 令牌的代码 (makeToken.rb)。需要填写通行证和服务器。

#!/usr/bin/env ruby

require 'jwt'
require 'digest'

user = "admin"
pass = "<SECRET>"
server = "http://localhost:8080/"

method = "#{ARGV[0]}"
uri = "#{ARGV[1]}"

payload = {}

headers = {
  iss: 'admin',
  iat: Time.now.to_i,
  exp: Time.now.to_i + 600,
  qsh: Digest::SHA256.hexdigest("#{method}&#{uri}")
}

token = JWT.encode headers, pass, 'HS256'
print("#{token}")